Closed mernen closed 6 years ago
Thanks for the patch.
And wow, good catch about RandomState. What do you think about requiring K: Ord
and using that to break ties, so it doesn't depend on the implementation details of the hash function? (Marginally related: Then we could also use BTreeMap
to remove the K: Hash
bound entirely ... probably I should benchmark that.)
One new corner case is now the iterator yielding duplicate keys. Last occurence wins?
Ord
seems to be the best option to ensure deterministic behavior. And then I agree at this point it might make sense to switch to BTreeMap
internally altogether, but yeah, it’s better to benchmark before making such changes.
As for the corner case, last occurrence does seem like a good strategy — it’s the behavior you’ll typically see when duplicate keys are found in maps (e.g. literals in scripting languages).
This patch replaces the
codebook()
input fromHashMap
to anything that implements a similar iterator. Existing invocations usingHashMap
remain fully compatible.I did this for two reasons:
RandomState
) is extremely dangerous if you have symbols with equal weights, as they may be visited in a different order on each execution of your program, resulting in different codes for those symbols. My persisted data was getting corrupted, and I had to switch to fnv to get deterministic results across runs.