piskvorky / bounter

Efficient Counter that uses a limited (bounded) amount of memory regardless of data size.
MIT License
935 stars 47 forks source link

Maximum count of 4_294_967_295 on CountMinSketch #51

Closed jonsnowseven closed 1 year ago

jonsnowseven commented 2 years ago

Hello all.

https://github.com/RaRe-Technologies/bounter/blob/21aeda1b88402bacb44ce92d05c08b632a1edb21/cbounter/cms_conservative.c#L10

Is it possible to increase this type of make it configurable through Python? Otherwise, the following snippet won't work:

from bounter import CountMinSketch
cms = CountMinSketch(depth=19, width=2 ** 30)
cms.increment("some_value", 4_294_967_295)
cms.increment("some_value", 1_000)

cms["some_value"] # should be 4_294_967_295 + 1_000 but instead returns 4_294_967_295
jponf commented 1 year ago

@jonsnowseven since my PR has been merged I believe this issue is now resolved. Please confirm :)

jonsnowseven commented 1 year ago
from bounter import CountMinSketch  # version 1.2.0
from bounter.count_min_sketch import CellSize
cms = CountMinSketch(depth=19, width=2 ** 30, cell_size = CellSize.BITS_64)
cms.increment("some_value", 4_294_967_295)
cms.increment("some_value", 1_000)

print("{:,}".format(cms["some_value"])) # prints 4,294,968,295

It worked!

Thanks @jponf 🤜🤛