ipfs-inactive / archives

[ARCHIVED] Repo to coordinate archival efforts with IPFS
https://awesome.ipfs.io/datasets
183 stars 24 forks source link

Make the Rabin Chunker perform well, or document why it's not fixable #142

Open flyingzumwalt opened 7 years ago

flyingzumwalt commented 7 years ago

Based on the tests in #137 the rabin chunker isn't actually providing any real deduplication benefits. It's also really slow.

DonaldTsang commented 6 years ago

@flyingzumwalt a good way to try this is to create file-format specific Rabin chunking.

Keywords: Content Defined, Chunking, Deduplication

DonaldTsang commented 6 years ago

It might be good to do some research on FastCDC and Asymmetric Extremum, which has low computational overheads.