Closed samukweku closed 4 months ago
🚀 Deployed on https://deploy-preview-1362--pyjanitor.netlify.app
Ok, I just had a chance to look through the PR. Super high quality work! There was one file that was a tad too long where the implementation happened; I'm going to trust that it works fine. Otherwise, thank you for keeping the code test coverage high, @samukweku!
I am going to approve. Please do the honors of merging!
@ericmjl thanks for the feedback... I have to figure out how to break up such PRs into small chunks
PR Description
Please describe the changes proposed in the pull request:
sort_by_appearance
is Truepivot_longer_spec
, which allows unpivoting by hand - this allows more granular control on how the final dataframe should look in long form.In [11]: events = pd.DataFrame( ...: { ...: "country": ["United States", "Russia", "China"], ...: "vault_2012_f": [ ...: 48.132, ...: 46.366, ...: 44.266, ...: ], ...: "vault_2012_m": [46.632, 46.866, 48.316], ...: "vault_2016_f": [ ...: 46.866, ...: 45.733, ...: 44.332, ...: ], ...: "vault_2016_m": [45.865, 46.033, 45.0], ...: "floor_2012_f": [45.366, 41.599, 40.833], ...: "floor_2012_m": [45.266, 45.308, 45.133], ...: "floor_2016_f": [45.999, 42.032, 42.066], ...: "floor_2016_m": [43.757, 44.766, 43.799], ...: } ...: )
In [12]: events Out[12]: country vault_2012_f vault_2012_m ... floor_2012_m floor_2016_f floor_2016_m 0 United States 48.132 46.632 ... 45.266 45.999 43.757 1 Russia 46.366 46.866 ... 45.308 42.032 44.766 2 China 44.266 48.316 ... 45.133 42.066 43.799
[3 rows x 9 columns]
events = pd.concat([events]*100_000)
dev
In [848]: %timeit events.pivot_longer(index='country', names_to=['event','year','gender'], namessep='',sort_by_appearance=False) 62.9 ms ± 361 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [849]: %timeit events.pivot_longer(index='country', names_to=['event','year','gender'], namessep='',sort_by_appearance=True) 165 ms ± 1.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
PR
In [842]: %timeit events.pivot_longer(index='country', names_to=['event','year','gender'], namessep='',sort_by_appearance=False) 53.2 ms ± 264 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [843]: %timeit events.pivot_longer(index='country', names_to=['event','year','gender'], namessep='',sort_by_appearance=True) 48 ms ± 486 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
This PR resolves #1361 .
PR Checklist
Please ensure that you have done the following:
<your_username>
:dev
, but rather from<your_username>
:<feature-branch_name>
.AUTHORS.md
.CHANGELOG.md
under the latest version header (i.e. the one that is "on deck") describing the contribution.Automatic checks
There will be automatic checks run on the PR. These include:
Relevant Reviewers
Please tag maintainers to review.