ipld / go-car

A content addressible archive utility
Other
153 stars 44 forks source link

blockstore: add an option to skip duplicate Puts by CID, mimicking carv1's Selective Writer API #123

Closed mvdan closed 3 years ago

mvdan commented 3 years ago

Filecoin writes proofs into CAR files which are hashed, so we need their contents to be deterministic.

The way Filecoin currently generates those CARv1 files is via v1's selective writer API, which ensures canonical ordering via traversals, and also deduplicates by CID: https://github.com/ipld/go-car/blob/71cfa2fc2a619d646606373c5946282934270bd4/selectivecar.go#L229-L230

For Ignite's current project, they receive blocks via graphsync, which ensures the order of blocks as per the IPLD selector, just like v1's selective writer. However, we might receive duplicate blocks from a client. When graphsync receives blocks they end up getting "Put" to our carv2 read-write blockstore.

If we want to be compatible, we should support deduplicating by CID. I propose a ReadWrite blockstore option for it, like DeduplicateByCID; if one calls Put on the same CID twice, the second call will simply do nothing and return a nil error.

In the future we could satisfy this need by porting Selective Writers to carv2 (https://github.com/ipld/go-car/issues/104), but that can't happen for another month or two.

I could also ask Ignite to implement a Blockstore wrapper that does this deduplication on Put calls, but deduplicating by CID also seems like a reasonable opt-in feature that others might want in the future. It wouldn't make the API significantly more complex or the read-write blockstore significantly slower, either.

mvdan commented 3 years ago

Fixed by https://github.com/ipld/go-car/pull/127.