jsoftware / jsource

J engine source mirror
Other
665 stars 91 forks source link

Use xxhash (or similar) for i. on large cells #145

Open moon-chilled opened 2 years ago

moon-chilled commented 2 years ago

interesting links: https://github.com/rurban/smhasher https://bigdata.uni-saarland.de/publications/p249-richter.pdf

moon-chilled commented 2 years ago

Also investigate: for scalar tables that blow out of cache, might it be worthwhile to use a slower, higher quality hash function than crc, in order to reduce memory traffic from extraneous probes?

mlochbaum commented 2 years ago

We picked out wyhash for BQN, which seems to be going well. It also comes with a very fast random number generator, although the state space on that is kind of small.

What I've seen with a table with a bad multiplication-based hash was that it usually had fewer collisions but occasionally a very large number of them. I think CRC is probably similar, so having a backup option for protection is good but you shouldn't expect it to speed things up generally. Either way the effect should be very small relative to the base rate of collisions.

bilam commented 2 years ago

I have experience with l4z which uses xxhash. L4z is insanely fast.

On Mon, 3 Oct 2022 at 3:00 PM moon-chilled @.***> wrote:

interesting links: https://github.com/rurban/smhasher https://bigdata.uni-saarland.de/publications/p249-richter.pdf

— Reply to this email directly, view it on GitHub https://github.com/jsoftware/jsource/issues/145, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMDKYMSTFHJWE6CREJFO6DWBKAAJANCNFSM6AAAAAAQ3JTVJQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

bilam commented 2 years ago

Oop I meant lz4

moon-chilled commented 2 years ago

Case in point: pathological case for crc reported on the forums:

   a=. 1,.~1e5 3?@$100
   b=. 1e5 4?@$100
   timex'~.a'
0.210941
   timex'~.b'
0.010023