openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.64k stars 1.75k forks source link

Add xxHash as a checksum option for `checksum` and `dedup` algorithms #16503

Open iBug opened 2 months ago

iBug commented 2 months ago

Describe the feature would like to see added to OpenZFS

Add xxHash as an option for checksum and both xxhash and xxhash,verify for dedup.

How will this feature improve OpenZFS?

xxHash is sufficiently fast but much less collision-prone than fletcher4. This will improve ZFS resilience against silent data corruption as a competitive alternative to fletcher4.

Additional context

Performance as advertised by xxHash on its wiki: https://github.com/Cyan4973/xxHash/wiki/Performance-comparison (Note: fletcher4 not included in this page)

Collision ratio on xxHash wiki: https://github.com/Cyan4973/xxHash/wiki/Collision-ratio-comparison

amotin commented 2 months ago

Before discussing default change, the first step would to make it optional to measure its characteristics comparing to the others. And any algorithm added into the tree would have to stay there forever, so it must be really that good as advertised.

iBug commented 2 months ago

Sorry I misread the man page. I thought xxhash was already an option. Let me change this FR to adding it in the first place.

mcmilk commented 1 month ago

I think fletcher4 is a bit faster then current OpenZFS xxhash variants - so adding it as a new hash doesn't make sense. What version of xxHash is your intention?

Here you have some fine table with hashes and their speeds: https://rurban.github.io/smhasher/doc/table.html

Hash:       Speed in MiB/s
Fletcher 4: 15556.93
xxHash64    12108.87 (included in OpenZFS - zstd)
xxHash32:   5865.17 (included in OpenZFS - zstd)
iBug commented 1 month ago

@mcmilk Your table indicates xxh64 would be a good option. I'd like to reiterate that:

xxHash is sufficiently fast but much less collision-prone than fletcher4

With modern CPU so powerful, it makes sense to me to trade a bit of performance for much better sanity by replacing fletcher4 with xxh64.

mcmilk commented 1 month ago

Why not sth. like rapidhash, which has double the speed (23789 MiB/s) in that table¹ and no common problems ?

Also, with sse and avx the speed of fletcher-4 is a lot faster on my local notebook:

$ cat /proc/spl/kstat/zfs/fletcher_4_bench:
implementation   native         byteswap
scalar           9112861804     8831049465
superscalar      11681942207    11744320536
superscalar4     13586418453    11444139964
sse2             21310896019    10706136906
ssse3            21171146266    19126012775
avx2             38987296119    35445754442

¹https://rurban.github.io/smhasher/doc/table.html

Edit: I find xxh3 a nice fit: 61976089-aedeab00-af9f-11e9-9239-e5375d6c080f

gmelikov commented 1 month ago

jfyi fletcher4 on amd ryzen 7840u with avx512:

0 0 0x01 -1 0 6423074939 87451661526534
implementation   native         byteswap
scalar           10659121732    8706704548
superscalar      14087467630    11536293324
superscalar4     15814463114    12305581118
sse2             22675386320    10542805362
ssse3            22375429000    20235123389
avx2             39958006169    37408283214
avx512f          42448290424    17524854325
avx512bw         42461612087    37391201332
fastest          avx512bw       avx2