pyjanitor-devs / pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor
https://pyjanitor-devs.github.io/pyjanitor
MIT License
1.37k stars 170 forks source link

`row_to_names` improvement #1379

Closed samukweku closed 4 months ago

samukweku commented 5 months ago

PR Description

Please describe the changes proposed in the pull request:

speed improvement pandas (YMMV):

import pandas as pd; import janitor as jn; import numpy as np
df = pd.DataFrame({
    "a": ["nums", 6, 9],
    "b": ["chars", "x", "y"],
})
df = pd.concat([df]*100_000, ignore_index=True)

# this PR
%timeit df.row_to_names(0, remove_rows=True, reset_index=True)
2.41 ms ± 126 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.row_to_names(0)
27.3 µs ± 340 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# dev
%timeit df.row_to_names(0, remove_rows=True, reset_index=True)
13.2 ms ± 72.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.row_to_names(0)
2.81 ms ± 33.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This PR relates to #1352 .

ericmjl commented 5 months ago

🚀 Deployed on https://deploy-preview-1379--pyjanitor.netlify.app

codecov[bot] commented 5 months ago

Codecov Report

Attention: Patch coverage is 94.33962% with 3 lines in your changes missing coverage. Please review.

Project coverage is 87.36%. Comparing base (62c57c6) to head (9010b06). Report is 25 commits behind head on dev.

:exclamation: Current head 9010b06 differs from pull request most recent head ff82eba

Please upload reports for the commit ff82eba to get more accurate results.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #1379 +/- ## ========================================== - Coverage 94.48% 87.36% -7.12% ========================================== Files 80 86 +6 Lines 4367 5067 +700 ========================================== + Hits 4126 4427 +301 - Misses 241 640 +399 ```
ericmjl commented 4 months ago

I am going to merge!