Performance improvement by specifying initial array size

invertase / denque

The fastest javascript implementation of a double-ended queue. Used by the official Redis, MongoDB, MariaDB & MySQL libraries for Node.js and many other libraries.

https://docs.page/invertase/denque

Apache License 2.0

354 stars 33 forks source link

Performance improvement by specifying initial array size #42

Open nemosmithasf opened 2 years ago

nemosmithasf commented 2 years ago

Please correct me if I'm wrong, but wouldn't it be more performant if we can specify an initial array capacity rather than rely on the default size of 4? Especially if we are using it as a circular buffer...

I assume this would save us a few "grow array" operations?

nemosmithasf commented 2 years ago

I did some benchmarking and it seems changing the default array size significant deteriorates the performance.

That's really interesting. I couldn't figure out why from looking at the source. For my own edification, could you explain why this is? Or perhaps point to some part of the code that I can look at in closer detail to figure it out?

dnlmlr commented 2 years ago

Can you share how / what you benchmarked to come to that conclusion? I am also currently looking at possibly improving the performance of this library in certain situations. Especially the actual growth operation is currently pretty inefficient and I have seen quite significant improvements in situations where it needs to grow a lot

nemosmithasf commented 2 years ago

I just copied the code into my codebase and changed 4 to a value that is closer to what I anticipated my average queue size to be.

I then ran a benchmark on my codebase and compared having 4 and the new value as the default; and found that it was slower.

That said, my benchmark wasn't especially scientific, so don't treat it as authoritative- I'd definitely do independent testing if I were you.

dnlmlr commented 2 years ago

Only changing the new Array(4) will not work since the wrap-around is not calculated by doing index % this._list.length, but by index & this._capacityMask. This means that the capacity must always be a multiple of 2 for the wrapping to work and the _capacityMask must be also set to the matching value. The _capacityMask is always capacity - 1, which creates a binary mask that has 1 at all bit positions up to the capacity (for example cap = 16 = 0b10000 and mask = 15 = 0b01111).

For testing you could try to set the capacity to the next larger power of 2, e.g. new Array(1024) and the mask to 1 less this._capacityMask = 1023

EDIT: In this text when I say "capacity" I refer to the Array size (this._list.length / new Array(N)) and not the this._capacity variable. this._capacity is something different and not directly related to this conversation.

dnlmlr commented 2 years ago

I did a small test implementation of this with a matching benchmark and it seems to work. Tested at 1_000_000, 1_000 and 100 element insertions it is consistently faster when you know the capacity before.

1_000_000 elements: 58% faster
1_000 elements: 56% faster
100 elements: 154% faster

Platform info:
Windows_NT 10.0.19044 x64
Node.JS 18.5.0
V8 10.2.154.4-node.8
AMD Ryzen 7 2700X Eight-Core Processor          × 16

Denque (1000000) default capacity x 87.24 ops/sec ±1.67% (73 runs sampled)
Denque (1000000) preallocated capacity x 138 ops/sec ±0.58% (77 runs sampled)
Denque (1000) default capacity x 144,139 ops/sec ±2.03% (87 runs sampled)
Denque (1000) preallocated capacity x 225,468 ops/sec ±0.48% (92 runs sampled)
Denque (100) default capacity x 700,648 ops/sec ±0.36% (95 runs sampled)
Denque (100) preallocated capacity x 1,785,077 ops/sec ±0.61% (95 runs sampled)
DoubleEndedQueue (1000000) default capacity x 32.08 ops/sec ±2.08% (56 runs sampled)
DoubleEndedQueue (1000000) preallocated capacity x 30.67 ops/sec ±2.00% (55 runs sampled)

Here is the code. To run the benchmark do node benchmark/initialCapacity.js

One question about the possible implementation is how the initial capacity should be specified. In the example I made the first constructor arg be either array or number.

nemosmithasf commented 2 years ago

Only changing the new Array(4) will not work since the wrap-around is not calculated by doing index % this._list.length, but by index & this._capacityMask. This means that the capacity must always be a multiple of 2 for the wrapping to work and the _capacityMask must be also set to the matching value. The _capacityMask is always capacity - 1, which creates a binary mask that has 1 at all bit positions up to the capacity (for example cap = 16 = 0b10000 and mask = 15 = 0b01111).

Thanks for this explanation, though I'd be lying if I said I understood it all. Still, it seems like a fascinating technique, definitely something I'll look into.

One question about the possible implementation is how the initial capacity should be specified. In the example I made the first constructor arg be either array or number.

Wouldn't it made sense to just add a property to the options object at arg_1?

dnlmlr commented 2 years ago

Wouldn't it made sense to just add a property to the options object at arg_1?

Yeah that would probably be the other option. The problem I could see with that is that you need to specify both args in order to use the options object. So that means you would have to give an empty array as arg1 which makes the interface kinda klunky. Also the options argument is currently not even documented, not sure what is up with that

nemosmithasf commented 2 years ago

you need to specify both args in order to use the options object.

You can solve it by overloading, since arg_0 can never be an object.