root-11 / tablite

multiprocessing enabled out-of-memory data analysis library for tabular data.
MIT License
37 stars 8 forks source link

Join with constant memory footprint #132

Closed realratchet closed 9 months ago

realratchet commented 9 months ago

Recreating the #124 pull request because it got accidentally pushed to tablite instead of fork. Also fixed some leftover issues with typing because 3.9 doesn't support pipe union operator.

realratchet commented 9 months ago

Did I just trigger that 1/100 in the tests? The same tests passed in fork.

codecov-commenter commented 9 months ago

Codecov Report

Attention: 52 lines in your changes are missing coverage. Please review.

Comparison is base (3f54136) 82.05% compared to head (74032e3) 82.07%.

Files Patch % Lines
tablite/joins.py 84.09% 45 Missing :warning:
tablite/base.py 76.47% 4 Missing :warning:
tablite/merge.py 81.25% 3 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #132 +/- ## ========================================== + Coverage 82.05% 82.07% +0.01% ========================================== Files 27 27 Lines 4002 4190 +188 ========================================== + Hits 3284 3439 +155 - Misses 718 751 +33 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

realratchet commented 9 months ago

Maybe it's because we don't use the original table class but tablite.base.Table probably should be renamed to BaseTable to avoid namespace confusion.

Although if it's an issue with constructor why would be be non-deterministic in single process tasks?

root-11 commented 9 months ago

Maybe it's because we don't use the original table class but tablite.base.Table probably should be renamed to BaseTable to avoid namespace confusion.

Rename is a good idea 👍