Closed alexanderkiel closed 4 years ago
Hi Alexander, thanks for pinging!
This is an intentional tradeoff since:
Is your concern theoretical, or have you actually run into a problem on your end?
I would suggest to add some hint to the documentation that the hashing system works only for small amounts of custom types.
Would be happy to look at a PR if you have something in mind 👍
Furthermore I would include collided keywords explicitly in the warning inside extend-thaw.
I'm not sure I follow? Collisions will already print a warning message, including the specific id.
Thanks!
You are right, as I already said, for most use cases it is ok.
Is your concern theoretical, or have you actually run into a problem on your end?
Yes, I explore to use something like 50 to 100 records instead of normal maps in my application and plan to serialize them with nippy.
Would be happy to look at a PR if you have something in mind
Something like this:
* Keyword - 2 byte overhead, resistent to id collisions
* Keyword - 2 byte overhead, resistent to id collisions (for < 10 different custom types)
I'm not sure I follow? Collisions will already print a warning message, including the specific id.
I would include both, the old and the new id.
Docstring will be updated in next release, thanks Alexander!
You use 16-bit Hashes for custom type keyword ID's. The equation to calculate hash collisions is:
or simpler, but only valid for small probabilities:
So assume one would like to use at least 128 custom types, as it would be possible with numeric ID's, the odds of a hash collision would be 1 out of 10. So in 1/10 of cases were I use roughly 128 random keywords to identify my custom types, I will have a hash collision.
For a small number of custom types like 12 the odds are 1 out of 1000. So I assume for most use cases the choice of a 16-bit hash is ok. But if one plans to use a large number of custom types, collisions will be a problem.
I would suggest to add some hint to the documentation that the hashing system works only for small amounts of custom types. Furthermore I would include collided keywords explicitly in the warning inside
extend-thaw
.