ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.18k stars 3.01k forks source link

enhancement: tarsnap chunker / importer (Content defined, polynomial, fast converging chunking) #3603

Open donothesitate opened 7 years ago

donothesitate commented 7 years ago

Version information:

go-ipfs version: 0.4.4

Type: Feature, Enhancement

Priority: P4

Area: Tools, Importer

Description:

Better suited for maximizing deduplication ratio then current Rabin chunker.
Using smaller chunks with faster convergence yields greater space savings, and the benefit depending on dataset can be great in comparison to Rabin.

The mean chunk size used by tarsnap is 64k.

Source: https://github.com/Tarsnap/tarsnap/blob/master/tar/multitape/chunkify.h https://github.com/Tarsnap/tarsnap/blob/master/tar/multitape/chunkify.c

Related: https://moinakg.wordpress.com/2012/11/11/inside-content-defined-chunking-in-pcompress/ https://moinakg.wordpress.com/2012/11/15/inside-content-defined-chunking-in-pcompress-part-2/

whyrusleeping commented 7 years ago

Note to self: move to ipfs/importers repo when we make that