Open moon-chilled opened 2 years ago
Also investigate: for scalar tables that blow out of cache, might it be worthwhile to use a slower, higher quality hash function than crc, in order to reduce memory traffic from extraneous probes?
We picked out wyhash for BQN, which seems to be going well. It also comes with a very fast random number generator, although the state space on that is kind of small.
What I've seen with a table with a bad multiplication-based hash was that it usually had fewer collisions but occasionally a very large number of them. I think CRC is probably similar, so having a backup option for protection is good but you shouldn't expect it to speed things up generally. Either way the effect should be very small relative to the base rate of collisions.
I have experience with l4z which uses xxhash. L4z is insanely fast.
On Mon, 3 Oct 2022 at 3:00 PM moon-chilled @.***> wrote:
interesting links: https://github.com/rurban/smhasher https://bigdata.uni-saarland.de/publications/p249-richter.pdf
— Reply to this email directly, view it on GitHub https://github.com/jsoftware/jsource/issues/145, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMDKYMSTFHJWE6CREJFO6DWBKAAJANCNFSM6AAAAAAQ3JTVJQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Oop I meant lz4
Case in point: pathological case for crc reported on the forums:
a=. 1,.~1e5 3?@$100
b=. 1e5 4?@$100
timex'~.a'
0.210941
timex'~.b'
0.010023
interesting links: https://github.com/rurban/smhasher https://bigdata.uni-saarland.de/publications/p249-richter.pdf