Closed tjbookreader closed 5 years ago
@tjbookreader it sounds like you have a good idea of where the bug is and how to fix it. Can you open a PR with the fix?
This repo is hooked to Travis, which will run all tests automatically.
I'm very new to GitHub. I believe that I have created the PR. Let me know if that worked. Thanks.
Description
The underlying C code uses unsigned 32-bit integers for the width and depth indices of the count min sketch table. However, Python is using signed 32-bit integers. This means that if you create a 16GB count min sketch with the default depth = 8 and log_counting = 8, then the width should be 16B / 8 / 1 = 2^32. But doing so causes an OverflowError.
Steps/Code/Corpus to Reproduce
Expected Results
Expect to create the largest bounter possible without error.
Actual Results
The following error occurs.
Possible Fix
Looking at cbounter/cms_common.c the problem seems to be from Line 65 in the
PyArg_ParseTupleAndKeywords(args, kwds, 'ii', kwlist, &w, &self->depth)
. The 'ii' tells python to use a signed integer for &w and &self->depth. Switching the lowercase 'ii' to uppercase 'II' would tell python to use an unsigned integer.This formatting of 'ii' also appears in Line 381
Py_BuildValue("(ii)", self->width, self->depth)
.Versions
Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 18:21:58) [GCC 7.2.0] NumPy 1.14.2 bounter 1.0.1