Closed Chris3606 closed 1 year ago
This looks like the most thorough analysis we could possibly do and still wind up with, essentially, "It's better, and I don't know why." Just adding and rotating doesn't seem like it should work this well, especially when adding and left-shifting (SimpleShift in the benchmarks) tends to get no full-hash collisions even when both positive and negative inputs are used. But, it does seem to work quite well. Since the original GoRogue hash seems to do a lot more work just to get significantly more collisions than it should, switching out to almost any of the competitive hashes would probably be a win (MultiplySum, in particular, seems robust much of the time, despite two multiplications and an add being supposedly slower than an add and a rotation).
This seems to have been a rather powerful nerd-snipe; we could probably find people involved with the internals of this part of .NET, and everyone combined would still be missing the complete picture of why some hashes perform better. I'm definitely impressed by how well .NET's Dictionary and HashSet handle such simplistic hashes; they seem to be doing the bulk of the work to distribute items properly, so the hash can be really quick. I doubt we can find out much more about this without finding some way to test directly against the prime-modulus hash table used by .NET here, but what we do know should be enough for a nice speedup.
Overview
This builds on #100 by creating a more complete set of hashing algorithm tests, and based on those results, selecting a new hashing algorithm to use in
Point.GetHashCode()
(closes #57).Changes
No breaking changes, since the only real change is the return values from
Point.GetHashCode()
, which best practice states users should not depend on. The implications from this change are mostly related to performance.Performance Implications
The choice of hashing algorithm here focuses primarily on generating the best performance possible when
Point
is used inDictionary
orHashSet
, since these are by far the primary use cases for GetHashCode in C#. Both of those data structures use a prime modulus strategy for determining what bin to place items in, which in short means that a few hash collisions are actually tolerable, and often a faster algorithm with slightly more collisions tends to perform better than a slower algorithm that more completely mitigates collisions; but it also means that collisions are generated in the hash set which are not evident based on just the hash algorithm itself.In light of this, and in combination with the fact that there are a lot of other variables that affect hash set performance like cache locality, it is quite difficult to establish very consistent patterns which correlate a hash code to performance. However, generally speaking, a very simple hashing algorithm ended up performing quite well; one of the best was an algorithm which (in the tests) is called
BareMinimum
. It's basically just a summation of the x value, added to y value rotated left by 16 bits. The algorithm is fairly well-suited to the primitive library's requirements, for the following reasons:rol
)KnownSizeHasher
over some positive coordinate spaces.Therefore, the default
GetHashCode()
implementation for point now implementsBareMinimum
.Analysis
As stated above, the large number of variables and internal implementation details which affect
HashSet
andDictionary
performance made it very difficult to establish concrete patterns based on benchmarks. In some cases, exactly why the results are the way they are is not something I fully understand; and it appears that it would take some diving into the internals of those collections to get any further.That much said, one algorithm that performed quite well is
BareMinimum
. It is quite fast (though not the fastest), and generates very few collisions over the coordinate spaces typically used (but not the fewest). Despite not being the "best" in each category, it very consistently performs as good or better thanOriginalGoRogue
, which represents the hasher used currently by the primitives library.The performance test results are listed in the section that follows this one; and they detail the times it takes to perform dictionary add and get operations, and the same for hash set, using a bunch of hashing algorithms, as well as the default hasher. There are a few characteristics of these results that are notable:
CurrentPrimitives
uses the default hash code implementation; more precisely, it passes no equality comparer at all to the dictionary/hash set; and this appears to produce different results than passing the same algorithm as a custom equality comparer. This is evident, because the results contain both. This PR switchesPoint.GetHashCode()
, and thereforeCurrentPrimitives
, to useBareMinimum
; but theCurrentPrimitives
test case quite often performs better than theBareMinimum
one does. I'm not sure exactly why this is; it could be everything from a specific optimization made by the dictionary implementation, to a semantic of the implementation that allows devirtualization in one case but not the other. It can be investigated further in the future, but for now, it means that we cannot compare CurrentPrimitives directly to the other algorithm results.Shuffle
call which shuffles the input list. Namely, many algorithms perform much better. This suggests that cache locality of the buckets may be playing a significant role in the result. This may explain some instances where more collisions end up generating better performance as well.Add
tests in particular are probably at least somewhat unreliable in terms of measuring the hashing algorithms performance; this is because the tests must create a newDictionary
as part of the timed process; and even if they did not, resizing and creation of additional buckets would still definitely have to take place. Therefore, it is possible that memory allocation is a significant factor in the test results. This is not entirely independent of the hashing algorithm, since the hashes generated do play a part in this; however there are certainly more factors at play here than anywhere else and therefore all differences the hashing algorithm directly creates may not be evident in these test results.KnownSize
andKnownRange
test cases are probably not to be trusted in theGaussian
distribution. Since these ranges don't meet the constraint of having coordinates that fall within the expected ranges, we're effectively violating the contract these hashers assume to be valid. Therefore it is expected that the results for KnownSize and KnownRange vary so widely in the Gaussian data sets.The theme above is, there appear to be factors other than the direct observable characteristics of the hashing algorithm that are playing a significant role in the test results, and what these factors are is not something I entirely understand. I do have a few theories; one possibility is that the modulus being performed by the internal implementation of dictionary is generating collisions that we can't see directly by analyzing the number of collisions produced by the hash code. Another, is that the hash code is causing differences in a) the cache locality of buckets being accessed and b) the allocation of memory itself that occurs. This in particular would explain some of the inconsistencies at the 1024 size compared to others, as it would be more likely to encounter cache misses if you jump around to buckets that are not close to each other in memory.
Another common theme, however, is that
BareMinimum
is better thanOriginalGoRogue
(which is thePoint.GetHashCode()
implementation prior to this PR). In all but one of the test cases, it is measurably better (sometimes by only a little, sometimes by as much as about twice as fast); and in the one test case where it is not faster, the difference between the two is well within the margin of error. Therefore, regardless of what factors are affecting the outcomes, it appears thatBareMinimum
should nearly always produce as good or better results than the old implementation does. That combined with the fact that the old implementation on paper produces many more collisions, I think makes a fairly compelling case to switch the default implementation.It is worthy of note that there are cases for
BareMinimum
that will generate much higher collision rates. The rate at which collisions will be generated significantly increases at coordinates >= 65536 X or Y, for instance. In fact, hashes start to repeat almost entirely around this range; (0, 0) -> (1023, 1023) and (65536, 65536) -> (66559, 66559) generate almost entirely the same set of 1024 * 1024 hash values. This seems like an exceedingly uncommon case to find in games, though; and should it ever occur, it seems likely custom hashing will need to be used anyway (or even a structure entirely outside ofDictionary
andHashSet
. I think most, if not all, use cases remain well inside the (65535, 65535) marker. In the future, it may be useful to include other hashing algorithms as equality comparers in the actual library; but I'd like to try to gain a better understanding of what is affecting these results first.Benchmarks