segmentio / go-hll

Go implementation of HLL that plays nicely with other languages
MIT License
37 stars 9 forks source link

v1.0.1: out of range access when union two hll #3

Open yangshuxin opened 3 years ago

yangshuxin commented 3 years ago

I'm using v1.0.1 and run into a out-of-range-access when union two hlls. The statement give rise to this problem is "s.Union(l)", where the s is using dense storage and have setting with log2m:11 and regwidth:8, and the l is using sparse storage with settings log2m:13 and regwidth:5.

When s bit-masking the bit corresponding to the element from l, it does not scale its current storage, and hence incur out-of-range access.

The call-stack ls following:

go-hll.denseStorage.setIfGreater at dense.go:175 go-hll.(Hll).union at hll.go:331 go-hll.(Hll).Union at hll.go:246

I don't get chance to produce minimal testing case to produce the this problem. I'm wondering if above description is sufficient.

I'm trying to give a workaround as following, is it safe?

s_settings, l_settings = get both s and l's settings
calculate sz(hll) = (1<< hll.log2m) * hll.regwidth
if sz(s) < sz(l) {
    l.union(s)
} else {
   s.union(l)
}
stevevls commented 3 years ago

Thanks for the report @yangshuxin ! Using HLLs with different parameters isn't a typical use case, so I'm not surprised we have some lingering bugs. Your workaround looks good to me, though. If you have time to submit a PR to fix the underlying issue, I'd be happy to review that as well.