lemire / FastBitSet.js

Speed-optimized BitSet implementation for modern browsers and JavaScript engines
Apache License 2.0
158 stars 19 forks source link

Would it help to if internal arrays were sparse? #8

Closed dimaqq closed 4 years ago

dimaqq commented 4 years ago

https://github.com/lemire/FastBitSet.js/blob/master/FastBitSet.js#L120 initialises the enlarged array slots to 0. I wonder if it would be better to have a hole there instead?

lemire commented 4 years ago

@dimaqq Can you elaborate? Did you run benchmarks? What is your mental model of how the array is implemented by the JavaScript engine?

dimaqq commented 4 years ago

Well, the implementation depends on the js engine: https://stackoverflow.com/q/52271899/705086

A good start is this overview how v8 [supposedly] automatically specialises arrays depending on data type: https://medium.com/@stephenhuh/javascript-at-a-low-level-v8-engine-squeeze-every-cycle-with-arrays-944b9d46061

My current mental model is that (pardon pseudocode):

const bitmap = [];
bitmap[0] = 0b1101;
bitmap[1] = 0;
...
bitmap[100] = 0;
bitmap[101] = 0b1011;

where bitmap[1..100] are zeroes is "dense" and different from "sparse":

const bitmap = [];
bitmap[0] = 0b1101;
bitmap[101] = 0b1011;

where bitmap[1..100] are holes, and js engine is free to implement that as either a flat array with sentinels, or as associative map and free to switch the underlying representation back and forth as holes are added and removes, as array is grown or shrunk.

JS spec has a definition of sparse arrays and hole behaviour, one summary is: https://remysharp.com/2018/06/26/an-adventure-in-sparse-arrays (maybe a bit hard to read), the gist is that [1, ,1] is different from [1, undefined, 1] even when both read as undefined on access, and the difference is visible in .map(f) argument invocation and a few other special cases.

I've tried using both fastbitset and typedfastbitset on my silly little full-text search attempt (100K records, <1M unique words), and fastbitset OOM'ed on node, while typedfastbitset did not, and neither did plain old Array. RAM usage was >2GB for fastbitset and 124MB~500MB for other bit storage options.

I think my attempt was accidentally quadratic in memory, roughly for i of 1...100K: rv[i] = new FastBitSet([i]), where setting a single high bit allocates a lot of low zero bits. Somehow the js engine saved me from myself in other bitset strategies, and I think it's because js engine optimises sparse-ish arrays.

Thus the idea to use sparse arrays explicitly and help the js engine to store sparse-ish bitmaps :)

lemire commented 4 years ago

Somehow the js engine saved me from myself in other bitset strategies

The TypedFastBitSet is backed by new Uint32Array(size) so there is no "hole". It may use less memory than FastBitSet but not because of the holes.

js engine is free to implement that as either a flat array with sentinels, or as associative map and free to switch the underlying representation back and forth as holes are added and removes, as array is grown or shrunk.

The good thing with a BitSet, backed by an array, is that has predictably good performance... and also have a predictable memory usage.

To illustrate, let me run the current benchmark... First, with the code in master...

$ node test.js

starting inplace union  benchmark
FastBitSet (inplace) x 140 ops/sec ±0.81% (82 runs sampled)

starting inplace intersection  benchmark
FastBitSet (inplace) x 14,059 ops/sec ±0.64% (90 runs sampled)

Next, let us remove line 120 and rerun the benchmark...

$ node test.js
starting inplace union  benchmark
FastBitSet (inplace) x 38.62 ops/sec ±2.70% (70 runs sampled)

starting inplace intersection  benchmark
FastBitSet (inplace) x 3,183 ops/sec ±0.26% (100 runs sampled)

So you go from 14,000 operations per second to 3,000 operations per second. It is several times slower. Not good.

fastbitset OOM'ed on node

Have you considered roaring? It should use less memory than just about anything else, and it is going to be really, really fast... Let me try again with roaring...

$ node test.js
starting inplace union  benchmark
roaring x 753 ops/sec ±1.02% (95 runs sampled)

starting inplace intersection  benchmark
roaring x 71,793 ops/sec ±0.11% (99 runs sampled)

See how we are several times faster than FastBitSet?

dimaqq commented 4 years ago

re: typed arrays: I have a suspicion that v8 optimises typed arrays that are mostly zeros.

re: roaring, I'm trying it now, the dependency tree is a bit scary but I see it's not all runtime, so let's see :)

lemire commented 4 years ago

@dimaqq

re: typed arrays: I have a suspicion that v8 optimises typed arrays that are mostly zeros.

I think that an typed array will use far less memory, at least in some instances, but it is not due to optimizations related to having 'mostly zeroes'.

Given that they were designed to be passed to underlying C/C++ functions, it would violate my expectations that they relied on anything by a flat array. Of course, if you never touch the memory, at all, you may not hit RAM at all, the pages may never be mapped. That is, there is a big difference between "untouched" and "mostly zeros".

We can test it out...

var x = new Uint32Array(1000000000);

This works, and I find that the memory usage is almost nil.

for(let i = 0; i < 1000000000; i++) {
  x[i] = 0;
}

And voilà... node reports 4GB of memory usage.

Let us see how much memory an array takes...

var x = new Array();
for(let i = 0; i < 100000000; i++) {
 x[i] = 0;
}

I find that about 16 bytes per entry are used.

Do you confirm?

re: roaring, I'm trying it now, the dependency tree is a bit scary but I see it's not all runtime, so let's see :)

The dependencies in JavaScript are a bit insane but I don't think that the roaring package started it.

lemire commented 4 years ago

Ah. Interesting. I can lower the memory usage to 8 bytes per entry with an array, if I just run the following...

for(let i = 0; i < 100000000; i++) { x[i] |= 0;}

in addition to the above code.

lemire commented 4 years ago

Ah. Even just

for(let i = 0; i < 100000000; i++) { x[i] = 0;}

Will do so it must be simple garbage collection.

lemire commented 4 years ago

Right... the following works....

$node --expose-gc
> var x = []
> for(let i = 0; i < 100000000; i++) {x[i] = 0;}
0
> global.gc();

After garbage collection, the memory usage is just about 8 bytes per entry.

lemire commented 4 years ago

Let us try the following...

> var x = []
> for(let i = 0; i < 100000000; i+=10) {x[i] = 0;}
> global.gc();

I again get roughly 8 bytes per entry.

lemire commented 4 years ago

You will still get the same memory usage if you go sparser...

var x = []; for(let i = 0; i < 100000000; i+=15) {x[i] = 0;}

But at some point if you go sparser, it will switch to a different mode that uses about 64 bytes per entry.

This happens, roughly if one out every ~20 words is untouched. So if your bitset is 99.9% empty, this will kick in. I would argue that, at that point, a bitset is the wrong data structure.

lemire commented 4 years ago

So I am going to close this after adding a paragraph in the README file about memory usage. You quite correct that TypedFastBitSet should provide better memory usage and the same good performance. Creating holes in the FastBitSet backing array only helps for very, very sparse arrays but at that point, there will be other performance concerns.

Thanks for your report.

dimaqq commented 4 years ago

re: roaring, I think I managed to get node to crash with roaring-wasm and since I intend to deploy this search client-side, in browsers, I can't use roaring-node. I've settled on typedfastbitset, it's compact, predictable and works in 3 main browsers.

lemire commented 4 years ago

re: roaring, I think I managed to get node to crash with roaring-wasm

You probably want to report the crash to the maintainers.

I've settled on typedfastbitset, it's compact, predictable and works in 3 main browsers.

Yes. I like it a lot better now after discussing the matter with you.

In fact, TypedFastBitSet was the original design... but, in some cases, I had performance problems with typed arrays. They seem to have gone away with better support for typed arrays in V8.