ironcalc / IronCalc

Main engine of the IronCalc ecosystem
Apache License 2.0
60 stars 2 forks source link

Adding garbage collector for `model.shared_strings` #18

Open fosdickio opened 6 months ago

fosdickio commented 6 months ago

Background

model.shared_strings is an optimization for faster string look ups. It is a mapping from a string value to a reference in memory. Multiple cells that contain the same string value all use the same reference.

Change Summary

This change adds a garbage collector for model.shared_strings. When cells are deleted using model.delete_cell, they are not removed from model.shared_strings. This change cleans up unreachable string references when model.evaluate is run. Before deleting a string, we must check the entire worksheet to make sure no other cells contain the deleted value before it can be deleted.

Testing

Follow-Up Work

There is additional garbage collection (e.g., styles clean-up, etc.) that can be done in future PRs.

References

codecov[bot] commented 6 months ago

Codecov Report

Attention: Patch coverage is 84.93151% with 11 lines in your changes are missing coverage. Please review.

Project coverage is 90.54%. Comparing base (1381533) to head (75ad996). Report is 4 commits behind head on main.

Files Patch % Lines
xlsx/src/export/mod.rs 31.25% 11 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #18 +/- ## ========================================== + Coverage 90.36% 90.54% +0.18% ========================================== Files 156 157 +1 Lines 27401 27692 +291 ========================================== + Hits 24762 25075 +313 + Misses 2639 2617 -22 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

fosdickio commented 6 months ago

I think we should do a PR with all the cases where memory should be freed.

@nhatcher I was thinking it might be simpler to have separate PRs for garbage collection clean up of shared strings and style cleanup. Are you suggesting having these in the same PR?

nhatcher commented 6 months ago

I think we should do a PR with all the cases where memory should be freed.

@nhatcher I was thinking it might be simpler to have separate PRs for garbage collection clean up of shared strings and style cleanup. Are you suggesting having these in the same PR?

Yeah, I was kind of thinking that we could only merge PRs that are feature complete. The problem with merging a PR that has missing features is that it is easy to put them aside and never et back to them. Let's talk about this on a call.

fosdickio commented 4 months ago

@nhatcher and I discussed this on a call. Instead of having this as a small PR with only the shared strings cleanup, it should include all garbage collection functionality. There are three additional things that should be included before the garbage collector functionality gets merged:

  1. Shared styles
  2. Shared formulas
  3. [Optional] Cleanup of columns --> @nhatcher will file issue explaining in further detail