Closed connor4312 closed 5 years ago
The unnecessary [].slice.call
call is likely strongly affecting the benchmark results.
I was following the currently recommended method of serialization from the readme. The slice call is necessary on platforms that support typed arrays, which includes Node.js and most modern browsers.
> var a = new Int32Array(3);
> a[0] = 1, a[1] = 2, a[2] = 3;
> JSON.stringify(a);
'{"0":1,"1":2,"2":3}'
> JSON.stringify(Array.prototype.slice.call(a));
'[1,2,3]'
Stringifying and then parsing the buckets without the slice call also does not allow for correct storage in bloomfilter's current implementation. Replacing the unit test I added to use JSON on the buckets rather than de/serialize (new BloomFilter(JSON.parse(JSON.stringify(start.buckets)), 4)
), you'll get the following failure:
bloom filter
✗ serialization
» expected 1024,
got NaN (==) // bloomfilter-test.js:64
✗ Broken » 6 honored ∙ 1 broken (0.021s)
We're now using this in production and it's working well. :smile:
Thank you for this approach. The array version could be stored to local storage. How about this?
Hey, sorry, managed to miss your message before. This cannot directly be stored in localstorage, as localstorage only supports strings. However, DOMStrings in JavaScript are UTF-16 encoded, meaning that you can easily treat the array as a string and convert back and forth, like this:
function bytesToString(arr) {
var out = '';
for (let i = 0; i < arr.length; i += 2) {
out += String.fromCharCode((arr[i] << 8) | arr[i + 1]);
}
return out;
}
function stringToBytes(str) {
const arr = new Uint16Array(str.length * 2);
for (let i = 0; i < str.length; i++) {
let code = str.fromCharCode(str[i]);
arr[i * 2 + 0] = code >> 8;
arr[i * 2 + 1] = code;
}
return arr;
}
Any updates on this?
@jasondavies, ping?
In the mean time I've published a public npm package with this patch: bloomfilter-papandreou@0.0.16-patch1
This adds an official way to serialize the bloom filter (in Node) into a Buffer. This is faster than the recommended array method, and somewhat more idiomatic. It's also inherently smaller and compresses better :smile:
Benchmark: