Closed RubenKelevra closed 8 years ago
https://github.com/zfsonlinux/zfs/issues/234 fletcher4,verify checksum on Dedup
https://github.com/zfsonlinux/zfs/issues/234#issuecomment-2091960
It's unfortunate that Jeff Bonwick's blog post linked by the OP is so widely referenced, since it's incorrect - about three weeks after it was posted the 'fletcher4,verify' option was permanently removed from ZFS due to the discovery of major bugs: http://mail.opensolaris.org/pipermail/zfs-discuss/2009-November/034106.html
(Edit: mail.opensolaris.org is long gone, but for reference, archive.org has it at http://web.archive.org/web/20100111013302/http://mail.opensolaris.org/pipermail/zfs-discuss/2009-November/034106.html)
Sadly people continued blogging about that feature for the next year, obviously without actually trying it :-(.
would be nice to have (accelerated) versions of SHA-512, Skein, Edon-R - especially for deduplication
http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/49559 New fast hash algorithm - is it needed?
https://reviews.csiden.org/r/223/ 4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
https://github.com/zfsonlinux/zfs/issues/3770 Feature Request: New checksum algorithms in ZFS
@kernelOfTruth Yes I already found this, but bugs might be solved. :)
The maillink is sadly dead: http://mail.opensolaris.org/pipermail/zfs-discuss/2009-November/034106.html
Actually a smaller algorithm as fletcher4 would reduce the memory footprint of the deduplication table, increasing by doubling the size of the checksum width is not the right way to go :)
I think if sha256 is not trustworthy for data-comparison sha256,verify should be used.
Arguably, if you are going to use verify, you might as well use whatever the fastest hash available is, e.g. hash127.
The penalty for verify is an increase in ARC misses and the subsequent I/O. Hash algo perf would be a distant secondary effect.
Yes, but the point here is that if you want to avoid a reasonable need for verify, use a hash that is appropriately collision resistant. If you are doing a verify pass anyway there is no point in using an expensive hashing algorithm.
For this reason, fletcher4 was removed as a dedup checksum option long ago.
Actually it would be interesting to check fletcher4 for dedup and add a support for using sha256-checksum on disk for verifying. Thus this would result in zfs set checksum=sha256 pool
dependency for this.
That would only cost very less I/O because the reads are very small and could be very easily cached. Else because of the smaller size of a fletcher4 checksum, it would dramatically reduce the memory footprint of the hashtable.
@richardelling I don't understand your point.
There was issues with the implementation of the algorithm, thats the reason it was removed.
all ZFS checksums are 256 bit in size
@richardelling fletcher4 isn't 256 bit width.
My idea with this feature request was: using fletcher4 in ram, to reduce the memory-footprint, use checksum=sha256 on disk, so if the fletcher4 match, both sha256 checksums from disk are read and compared.
The binary search tree would be massively reduced in size because fletcher4 is much smaller than sha256.
I think the answer to the original question is "no".
Are there any plans to readd fletcher4 for deduplication with less memory-footprint? :)