pyjanitor-devs / pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor
https://pyjanitor-devs.github.io/pyjanitor
MIT License
1.37k stars 170 forks source link

[ENH] `conditional_join` can return ragged_arrays where possible - PR no. 2 #1397

Closed samukweku closed 2 months ago

samukweku commented 2 months ago

This is the second part of a series of PRs that ultimately adds support for aggregations within conditional_join. where possible, ragged_arrays can be returned to the user, either as slices or arrays of indices, which can be used in akimbo, or awkward or pyarrow to aggregate the data. this should be faster than materializing the entire dataframe within pandas before aggregating.

PR Description

Please describe the changes proposed in the pull request:

This PR relates to #1269 and #1396 .

Please tag maintainers to review.

ericmjl commented 2 months ago

🚀 Deployed on https://deploy-preview-1397--pyjanitor.netlify.app

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 82.78689% with 21 lines in your changes missing coverage. Please review.

Project coverage is 89.23%. Comparing base (6e77fbc) to head (ef4f9f3). Report is 11 commits behind head on dev.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #1397 +/- ## ========================================== + Coverage 89.07% 89.23% +0.15% ========================================== Files 87 87 Lines 5374 5534 +160 ========================================== + Hits 4787 4938 +151 - Misses 587 596 +9 ```
ericmjl commented 2 months ago

I also noticed that the project test coverage has significantly gone down to <90%. I might need to dig further, but this is definitely a sign that we may need to invest more in covering edge cases throughout the repo.