jsoftware / jsource

J engine source mirror
Other
645 stars 91 forks source link

Use xxhash (or similar) for i. on large cells #145

Open moon-chilled opened 1 year ago

moon-chilled commented 1 year ago

interesting links: https://github.com/rurban/smhasher https://bigdata.uni-saarland.de/publications/p249-richter.pdf

moon-chilled commented 1 year ago

Also investigate: for scalar tables that blow out of cache, might it be worthwhile to use a slower, higher quality hash function than crc, in order to reduce memory traffic from extraneous probes?

mlochbaum commented 1 year ago

We picked out wyhash for BQN, which seems to be going well. It also comes with a very fast random number generator, although the state space on that is kind of small.

What I've seen with a table with a bad multiplication-based hash was that it usually had fewer collisions but occasionally a very large number of them. I think CRC is probably similar, so having a backup option for protection is good but you shouldn't expect it to speed things up generally. Either way the effect should be very small relative to the base rate of collisions.

bilam commented 1 year ago

I have experience with l4z which uses xxhash. L4z is insanely fast.

On Mon, 3 Oct 2022 at 3:00 PM moon-chilled @.***> wrote:

interesting links: https://github.com/rurban/smhasher https://bigdata.uni-saarland.de/publications/p249-richter.pdf

— Reply to this email directly, view it on GitHub https://github.com/jsoftware/jsource/issues/145, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMDKYMSTFHJWE6CREJFO6DWBKAAJANCNFSM6AAAAAAQ3JTVJQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

bilam commented 1 year ago

Oop I meant lz4

moon-chilled commented 1 year ago

Case in point: pathological case for crc reported on the forums:

   a=. 1,.~1e5 3?@$100
   b=. 1e5 4?@$100
   timex'~.a'
0.210941
   timex'~.b'
0.010023