lenskit / binpickle

Binary pickling library
https://binpickle.lenskit.org
MIT License
1 stars 1 forks source link

Support buffer de-duplication #11

Open mdekstrand opened 4 years ago

mdekstrand commented 4 years ago

It's possible that an object may have multiple numpy arrays with the same contents (this will arise in some LensKit use cases). We can support de-duplication by recording more robust checksums (MD5 or SHA) of buffers, and making the buffer store effectively content-addressed.

mdekstrand commented 8 months ago

Format version 2 has file checksums, which is one of the prerequisites for this.