pyjanitor-devs / pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor
https://pyjanitor-devs.github.io/pyjanitor
MIT License
1.37k stars 170 forks source link

Samukweku/refactor expand grid #1383

Closed samukweku closed 3 months ago

samukweku commented 4 months ago

PR Description

Please describe the changes proposed in the pull request:

should also resolve this discussion here

performance YMMV (compared to pd.merge) :

import pandas as pd
import janitor as jn

df1 = pd.DataFrame({'a':range(1,3), 'b':[2,1]})
df2 = pd.DataFrame({"x":[1,2,3],"y":[3,2,1]})
df3 = pd.DataFrame({"r":[2,3],"s":["a","b"]})

df1 = pd.concat([df1]*10_000)
df2 = pd.concat([df2]*200)

A=jn.cartesian_product(df1,df2,df3)
B=df1.merge(df2,how='cross').merge(df3,how='cross')
A.equals(B)
True

# this PR 
%timeit jn.cartesian_product(df1,df2,df3)
353 ms ± 4.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df1.merge(df2,how='cross').merge(df3,how='cross')
1.52 s ± 27.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# dev 
%timeit jn.expand_grid(others={'df1':df1,'df2':df2,'df3':df3})
438 ms ± 10.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit jn.expand_grid(others={'df1':df1,'df2':df2,'df3':df3}).droplevel(level=0,axis=1)
728 ms ± 8.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This PR resolves #1293 .

ericmjl commented 4 months ago

🚀 Deployed on https://deploy-preview-1383--pyjanitor.netlify.app

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 98.75000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.76%. Comparing base (62c57c6) to head (eee05b3). Report is 34 commits behind head on dev.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #1383 +/- ## ========================================== - Coverage 94.48% 89.76% -4.72% ========================================== Files 80 87 +7 Lines 4367 5392 +1025 ========================================== + Hits 4126 4840 +714 - Misses 241 552 +311 ```