[storage api ] Deduplicating blocks

scriptjs commented 8 years ago

A desirable use case for dat is the commit case. In this use case, when an archive is created, the user makes changes (now has a diff) creating another archive hash. In doing so, the user will not wish to duplicate blocks in the store. This case also applies to files that may be renamed in the archive but blocks backing the file should remain the same in the store.

Dat supports the deduplication of metadata and currently hypercore's storage API references data by block index. To enable the deduplication of blocks, hypercore's storage API should be modified to provide the capability of addressing content by block hash.

juliangruber commented 8 years ago

👍 this sounds very good to me!

andrewosh commented 6 years ago

Any plans to implement getting a block by its hash (or another approach to block deduping) at this level? It's not totally clear to me if that's a fast operation in merkle trees, and if not one would need a secondary index (a hash trie), and then you have hyperdb (if the keys were value hashes)!

Would you say that a slightly modified version of hyperdb is the level at which to tackle this?

holepunchto / hypercore

[storage api ] Deduplicating blocks #25