rust-lang / rustc-hash

Custom hash algorithm used by rustc (plus hashmap/set aliases): fast, deterministic, not secure
Apache License 2.0
377 stars 47 forks source link

Replace the 64-bit hash function and add some tests #18

Closed Zoxc closed 8 months ago

Zoxc commented 3 years ago

This replaces the 64-bit hash function with a new one which performs better for rustc, but still isn't a good hash function.

clap:check                        1.9608s   1.9558s  -0.26%
helloworld:check                  0.0419s   0.0420s  +0.27%
hyper:check                       0.2989s   0.2977s  -0.39%
regex:check                       1.1474s   1.1438s  -0.31%
syn:check                         1.6992s   1.6959s  -0.19%
syntex_syntax:check               6.8989s   6.8748s  -0.35%
winapi:check                      8.3412s   8.1812s  -1.92%

Total                            20.3884s  20.1912s  -0.97%
Summary                           3.5000s   3.4842s  -0.45%

This also adds some collision test adapted from aHash. It's MIT license claims Amanieu d'Antras as the copyright holder though (probably a copy paste error?). I'll ask @tkaitchuck to license that code under this crate's MIT/Apache 2.0 license.

tkaitchuck commented 3 years ago

As far as licence, all of aHash is dual Apache 2.0 and MIT. This algorithm is not good though. There is no rightward propagation, so the low order bits are bad which are the ones used to select the bucket. It also still has the null terminated string problem. If you want something really basic, try doing this. (Here is the function def) That is very fast on these architectures, which I suspect is 99% of what people use for build hosts.

That logic is also giving up a lot of performance in its loop for strings, which I suspect may be important for compilation. If you are interested this might be good to combine with https://github.com/tkaitchuck/aHash/issues/58 which I have not yet started on.

Zoxc commented 1 year ago

It seems the performance of this has changed over time, now there's mostly regressions:

BenchmarkBeforeAfter
TimeTime%
🟣 winapi:check7.4555s7.4002s -0.74%
🟣 clap:check1.8083s1.8153s 0.39%
🟣 hyper:check0.2654s0.2652s -0.07%
🟣 syntex_syntax:check6.2656s6.2867s 0.34%
🟣 syn:check1.6118s1.6165s 0.30%
🟣 regex:check1.0042s1.0089s 0.47%
Total18.4107s18.3929s -0.10%
Summary1.0000s1.0011s 0.11%
WaffleLapkin commented 8 months ago

@Zoxc do you want to re-benchmark / start a discussion about hashes on zulip?