pyjanitor-devs / pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor
https://pyjanitor-devs.github.io/pyjanitor
MIT License
1.37k stars 170 forks source link

[ENH] maintain sorted array for conditional join #1398

Closed samukweku closed 1 month ago

samukweku commented 2 months ago

This is the third part of a series of PRs that ultimately adds support for aggregations within conditional_join. the numba code now uses an array that is kept sorted, based on grantjenks' sortedcontainers implementation. In one case, perf. improved by about 50x compared to the current implementation; too large a perf diff?

PR Description

Please describe the changes proposed in the pull request:

This PR relates to #1269, #1396 and #1397 .

Please tag maintainers to review.

ericmjl commented 2 months ago

🚀 Deployed on https://deploy-preview-1398--pyjanitor.netlify.app

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 94.44444% with 14 lines in your changes missing coverage. Please review.

Project coverage is 83.72%. Comparing base (6e77fbc) to head (d3c5772). Report is 4 commits behind head on dev.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #1398 +/- ## ========================================== - Coverage 89.07% 83.72% -5.35% ========================================== Files 87 87 Lines 5374 5857 +483 ========================================== + Hits 4787 4904 +117 - Misses 587 953 +366 ```