klapaucius / vector-hashtables

Other
33 stars 5 forks source link

Support hashtables with more than 2G elements #26

Closed Bodigrim closed 6 months ago

Bodigrim commented 6 months ago

Some users run Haskell on machines with over 1TB memory, it's not impossible to have over 2G elements of hashtable.

https://github.com/klapaucius/vector-hashtables/blob/6a671f6ac39667af92708e9dd301e45dea858ed0/src/Data/Vector/Hashtables/Internal.hs#L905-L914

What's the principle behind the choice of prime numbers here? How can it be extended further, to cover entire Int64 range?

klapaucius commented 6 months ago

the principle is:

[3, 7 ... primeNumber_n, primeNumber_n+k > 1.2*primeNumber_n ...

so it can easily be extended given a sequence of prime numbers

klapaucius commented 6 months ago

Something like this

Prelude Math.NumberTheory.Primes Data.List> takeWhile (\x -> x < fromIntegral(maxBound :: Int)) . map unPrime $ iterate (\p -> nextPrime (ceiling $ 1.2 * fromIntegral (unPrime p))) (nextPrime 2050765853)
[2050765853,2460919037,2953102909,3543723499,4252468211,5102961869,6123554263,7348265123,8817918149,10581501797,12697802207,15237362659,18284835221,21941802293,26330162767,31596195347,37915434433,45498521329,54598225597,65517870827,78621445013,94345734019,113214880859,135857857043,163029428461,195635314181,234762377023,281714852449,338057822959,405669387577,486803265103,584163918127,700996701773,841196042177,1009435250623,1211322300767,1453586761009,1744304113229,2093164935877,2511797923093,3014157507727,3616989009287,4340386811153,5208464173387,6250157008087,7500188409719,9000226091663,10800271310027,12960325572053,15552390686477,18662868823783,22395442588543,26874531106253,32249437327559,38699324793071,46439189751707,55727027702123,66872433242599,80246919891143,96296303869429,115555564643329,138666677571997,166400013086401,199680015703691,239616018844471,287539222613371,345047067136049,414056480563273,496867776675943,596241332011217,715489598413487,858587518096247,1030305021715513,1236366026058641,1483639231270373,1780367077524493,2136440493029393,2563728591635279,3076474309962373,3691769171954869,4430123006345843,5316147607615027,6379377129138049,7655252554965709,9186303065958853,11023563679150639,13228276414980799,15873931697977019,19048718037572429,22858461645086941,27430153974104333,32916184768925261,39499421722710373,47399306067252449,56879167280702959,68255000736843607,81906000884212361,98287201061054833,117944641273265887,141533569527919211,169840283433503041,203808340120203719,244570008144244481,293484009773093401,352180811727712129,422616974073254567,507140368887905497,608568442665486697,730282131198584087,876338557438300843,1051606268925960971,1261927522711153157,1514313027253383839,1817175632704060721,2180610759244872743,2616732911093847187,3140079493312616483,3768095391975139873,4521714470370167831,5426057364444201011,6511268837333041219,7813522604799648773]

but probably fewer than that because each element is a bunch of words

Bodigrim commented 6 months ago

the principle is:

[3, 7 ... primeNumber_n, primeNumber_n+k > 1.2*primeNumber_n ...

so it can easily be extended given a sequence of prime numbers

Thanks for the explanation! So fundamentally there is enough freedom to select a prime around +20%.

I have an idea to choose primes such that division by them (which is ubiquitous throughout all routines) can be replaced by wide multiplication and right shift, but have not worked out all details yet.

Bodigrim commented 6 months ago

Fixed by #27.