willow-ahrens / Finch.jl

Sparse tensors in Julia and more! Datastructure-driven array programing language.
http://willowahrens.io/Finch.jl/
MIT License
158 stars 15 forks source link

Merge Identical Runs in SparseRLE #436

Closed willow-ahrens closed 6 months ago

willow-ahrens commented 6 months ago

This PR uses a lazy approach to merging identical runs. The steps are roughly:

  1. Freeze the buffer SubLevel
  2. Thaw the canon sublevel
  3. Copy deduplicated fibers from the buffer into the canon sublevel
  4. freeze the canon sublevel
  5. empty the buffer sublevel

It's not a particularly beautiful approach, but it allows us to automatically merge runs so that we can construct SparseRLE more seamlessly.

also, while the overhead of merging should be fairly minimal for SparseRLE(Element(0.0)), we may want to come up with a way to pass arguments to freeze so that we can skip deduplication. I'm open to suggestions, it could be a level parameter or a freeze parameter.

Next step (in this PR or otherwise):

Do the same thing for RepeatRLE level, essentially copy-pasting whatever we decide here.

willow-ahrens commented 6 months ago

@nullplay what do you think of this? The approach here is to merge the runs at the end. I'd like to merge this PR and then reorganize the test suite a bit, but in the future we could also add a level flag to skip deduplication.

codecov[bot] commented 6 months ago

Codecov Report

Attention: Patch coverage is 89.53488% with 36 lines in your changes are missing coverage. Please review.

Project coverage is 76.40%. Comparing base (d1a59ee) to head (c65cbbc).

Files Patch % Lines
src/tensors/levels/denserlelevels.jl 87.45% 32 Missing :warning:
src/tensors/levels/sparserlelevels.jl 95.50% 4 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #436 +/- ## ========================================== + Coverage 75.79% 76.40% +0.61% ========================================== Files 87 88 +1 Lines 7716 8057 +341 ========================================== + Hits 5848 6156 +308 - Misses 1868 1901 +33 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.