Closed tommyettinger closed 1 year ago
So some key things to consider: Will users ever have negative x and/or y, and if so, how far below 0 can they go? Some very unusual grid shapes could be trouble for some of these, even if x and y are always non-negative. Cantor forms diagonal stripes as it assigns higher and higher numbers to grid cells; Rosenberg-Strong forms square 'L'; shapes. Grids that are very tall and thin, or very short and wide, might make those shapes require much higher values for even a small-ish number of cells. This might not be a problem, because .NET's Dictionary uses a prime modulus rather than just a bitmask or shift... I have a feeling the main requirement is just that the hash be fast, since Dictionary takes care of the rest pretty thoroughly. BareMinimum with a left rotation by 8 is probably my suggestion for now, but rotating left by 16 should also be fine. There are bitwise rotation methods in some of the more recent .NET versions, and I don't know if you can use them, or if the JIT compiler can figure out that (x << shift | x >> 32 - shift)
is a bitwise rotation operation (making it about as fast as an addition).
OK, here's the latest benchmark's results, on equally-often positive and negative x and y:
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19045
Intel Core i7-10750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=7.0.102
[Host] : .NET Core 6.0.13 (CoreCLR 6.0.1322.58009, CoreFX 6.0.1322.58009), X64 RyuJIT
DefaultJob : .NET Core 6.0.13 (CoreCLR 6.0.1322.58009, CoreFX 6.0.1322.58009), X64 RyuJIT
| Method | Size | Mean | Error | StdDev | Median |
|---------------------------------- |----- |-------------:|-----------:|-----------:|-------------:|
| CurrentPrimitives | 10 | 2.548 us | 0.0504 us | 0.1197 us | 2.494 us |
| OriginalGoRogue | 10 | 2.624 us | 0.0515 us | 0.1076 us | 2.658 us |
| KnownSize | 10 | 2.391 us | 0.0462 us | 0.0678 us | 2.374 us |
| KnownRange | 10 | 2.418 us | 0.0483 us | 0.0834 us | 2.424 us |
| RosenbergStrongBased | 10 | 2.688 us | 0.0529 us | 0.0588 us | 2.668 us |
| RosenbergStrongBasedMinusMultiply | 10 | 2.677 us | 0.0480 us | 0.0449 us | 2.698 us |
| RosenbergStrongPure | 10 | 2.376 us | 0.0446 us | 0.0458 us | 2.367 us |
| CantorPure | 10 | 2.495 us | 0.0492 us | 0.0547 us | 2.472 us |
| BareMinimum | 10 | 2.410 us | 0.0477 us | 0.0603 us | 2.448 us |
| MultiplySum | 10 | 2.457 us | 0.0482 us | 0.0737 us | 2.507 us |
| CurrentPrimitives | 50 | 69.881 us | 1.3711 us | 1.9220 us | 70.744 us |
| OriginalGoRogue | 50 | 73.764 us | 1.4452 us | 1.6063 us | 74.315 us |
| KnownSize | 50 | 70.104 us | 1.3648 us | 2.0006 us | 69.372 us |
| KnownRange | 50 | 67.817 us | 0.4064 us | 0.3173 us | 67.873 us |
| RosenbergStrongBased | 50 | 80.669 us | 1.0561 us | 0.8245 us | 80.393 us |
| RosenbergStrongBasedMinusMultiply | 50 | 80.356 us | 1.5866 us | 2.5621 us | 79.330 us |
| RosenbergStrongPure | 50 | 75.738 us | 1.4949 us | 2.6957 us | 76.809 us |
| CantorPure | 50 | 71.955 us | 0.8454 us | 0.8303 us | 72.036 us |
| BareMinimum | 50 | 69.732 us | 1.3894 us | 2.2436 us | 69.691 us |
| MultiplySum | 50 | 68.396 us | 0.1718 us | 0.1607 us | 68.348 us |
| CurrentPrimitives | 100 | 335.781 us | 5.8695 us | 6.2803 us | 335.135 us |
| OriginalGoRogue | 100 | 350.627 us | 6.9640 us | 8.0197 us | 353.581 us |
| KnownSize | 100 | 335.773 us | 6.6714 us | 7.6828 us | 333.326 us |
| KnownRange | 100 | 342.423 us | 6.7300 us | 9.2121 us | 348.710 us |
| RosenbergStrongBased | 100 | 391.549 us | 7.8304 us | 10.7184 us | 385.444 us |
| RosenbergStrongBasedMinusMultiply | 100 | 392.421 us | 7.7475 us | 10.0739 us | 395.919 us |
| RosenbergStrongPure | 100 | 363.196 us | 7.2237 us | 10.8121 us | 357.511 us |
| CantorPure | 100 | 353.982 us | 6.9095 us | 10.7572 us | 357.089 us |
| BareMinimum | 100 | 327.382 us | 5.5715 us | 6.1927 us | 326.908 us |
| MultiplySum | 100 | 329.329 us | 6.5413 us | 11.1077 us | 330.779 us |
| CurrentPrimitives | 175 | 1,095.239 us | 19.2533 us | 18.0096 us | 1,090.769 us |
| OriginalGoRogue | 175 | 1,153.031 us | 22.7875 us | 25.3283 us | 1,152.057 us |
| KnownSize | 175 | 1,156.251 us | 15.1020 us | 14.1264 us | 1,157.790 us |
| KnownRange | 175 | 1,129.606 us | 11.8781 us | 11.1108 us | 1,131.388 us |
| RosenbergStrongBased | 175 | 1,451.467 us | 8.9499 us | 8.3717 us | 1,452.237 us |
| RosenbergStrongBasedMinusMultiply | 175 | 1,410.826 us | 12.0894 us | 10.7170 us | 1,413.116 us |
| RosenbergStrongPure | 175 | 1,285.783 us | 7.9979 us | 7.4813 us | 1,288.821 us |
| CantorPure | 175 | 1,251.229 us | 6.6537 us | 5.5561 us | 1,250.956 us |
| BareMinimum | 175 | 1,088.421 us | 13.0138 us | 12.1731 us | 1,089.591 us |
| MultiplySum | 175 | 1,093.047 us | 11.9116 us | 11.1421 us | 1,091.695 us |
| CurrentPrimitives | 256 | 2,924.071 us | 33.0242 us | 30.8908 us | 2,914.341 us |
| OriginalGoRogue | 256 | 2,970.857 us | 21.7161 us | 20.3133 us | 2,968.653 us |
| KnownSize | 256 | 2,923.818 us | 18.7399 us | 17.5293 us | 2,926.333 us |
| KnownRange | 256 | 2,935.779 us | 42.8776 us | 38.0098 us | 2,922.961 us |
| RosenbergStrongBased | 256 | 3,978.849 us | 27.0928 us | 25.3426 us | 3,979.130 us |
| RosenbergStrongBasedMinusMultiply | 256 | 3,966.713 us | 43.3875 us | 38.4619 us | 3,955.923 us |
| RosenbergStrongPure | 256 | 3,533.416 us | 24.9279 us | 23.3176 us | 3,525.887 us |
| CantorPure | 256 | 3,488.997 us | 27.4341 us | 25.6619 us | 3,482.179 us |
| BareMinimum | 256 | 2,821.189 us | 29.4691 us | 27.5655 us | 2,823.807 us |
| MultiplySum | 256 | 2,839.916 us | 34.9376 us | 30.9712 us | 2,830.097 us |
The main thing to note here: Because of negative inputs being more likely to collide with positive ones when the hash doesn't support negative x or y very well, Rosenberg-Strong and Cantor-based hashes start to take much more time with larger Dictionary sizes like 256 (20% to 35% more than most of the others, I think). BareMinimum still does surprisingly very well, as does MultiplySum. MultiplySum is also amenable to random hashing if you want to try that; you just choose two large odd numbers pseudo-randomly when some condition is detected that the hash may be facing trouble. Pseudo-random hashing is what jdkgdxds uses for its Set and Map classes, as a way of mitigating worst-case performance issues. If you find a case that does call for random hashing, my suggestion is to have a medium-large array of tested, known-good multipliers that the algorithm selects without modification, with probably a different (non-overlapping) array for x and for y.
I left comments in each commit that might be helpful. The important thing is that the simplest hash here, BareMinimum, gets very close to KnownSize hashing in terms of speed, and like KnownSizeHasher, it is not very random-seeming. I have the speed benchmarks for PointDictionaryAdd: