pyjanitor-devs / pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor
https://pyjanitor-devs.github.io/pyjanitor
MIT License
1.37k stars 170 forks source link

[ENH] improve performance for polars' `pivot_longer` #1377

Closed samukweku closed 4 months ago

samukweku commented 5 months ago

PR Description

Please describe the changes proposed in the pull request:

This PR relates to #1352 .

perf ... YMMV :

import polars as pl
import janitor.polars

evv = pl.read_csv('../evv.csv')
evv.shape
(30000, 801)
# dev 
 %timeit evv.janitor.pivot_longer(index='country', names_to = ['event','year','gender','num'], names_transform=pl.col('year').cast(int),names_sep='_')
1.5 s ± 6.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit evv.lazy().janitor.pivot_longer(index='country', names_to = ['event','year','gender','num'], names_transform=pl.col('year').cast(int),names_sep="_")
3 s ± 16.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit evv.lazy().janitor.pivot_longer(index='country', names_to = ['event','year','gender','num'], names_transform=pl.col('year').cast(int),names_sep="_").collect()
5.94 s ± 24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# this PR
%timeit evv.janitor.pivot_longer(index='country', names_to = ['event','year','gender','num'], names_transform=pl.col('year').cast(int),names_sep="_")
225 ms ± 8.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit evv.lazy().janitor.pivot_longer(index='country', names_to = ['event','year','gender','num'], names_transform=pl.col('year').cast(int),names_sep="_")
1.58 ms ± 4.36 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit evv.lazy().janitor.pivot_longer(index='country', names_to = ['event','year','gender','num'], names_transform=pl.col('year').cast(int),names_sep="_").collect()
263 ms ± 8.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
ericmjl commented 5 months ago

🚀 Deployed on https://deploy-preview-1377--pyjanitor.netlify.app

codecov[bot] commented 5 months ago

Codecov Report

Attention: Patch coverage is 95.74468% with 4 lines in your changes missing coverage. Please review.

Project coverage is 88.96%. Comparing base (62c57c6) to head (6a5f66e). Report is 27 commits behind head on dev.

:exclamation: Current head 6a5f66e differs from pull request most recent head 1fc553e

Please upload reports for the commit 1fc553e to get more accurate results.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #1377 +/- ## ========================================== - Coverage 94.48% 88.96% -5.52% ========================================== Files 80 86 +6 Lines 4367 5058 +691 ========================================== + Hits 4126 4500 +374 - Misses 241 558 +317 ```
samukweku commented 4 months ago

@ericmjl Ok to do a release?