padenot / ringbuf.js

Wait-free thread-safe single-consumer single-producer ring buffer using SharedArrayBuffer
https://ringbuf-js.netlify.app/
Mozilla Public License 2.0
201 stars 18 forks source link

Shouldn't we use TypedArray.set instead of a for loop for copying arraybuffers? #22

Open FallingSnow opened 1 year ago

FallingSnow commented 1 year ago

Set is much faster, even on small arrays.

Code in question: https://github.com/padenot/ringbuf.js/blob/016d963826a63aa715ae0e2d6ccd34f44c093c11/js/ringbuf.js#L346-L350

Benchmark: input is 10,000 F32, and subInput is 100 F32. Screenshot from 2023-08-08 15-08-53

padenot commented 1 year ago

We can in some capacity. The problem is that we need to copy part of a TypedArray to part of a TypedArray, since it's a circular buffer.

set takes two parameters, the input array, and an offset to write the data at in the destination array. In particular, it doesn't take an input offset, or an input size, or that kind of thing.

The way to do this in JavaScript is to take a DataView on the TypedArray. Unfortunately, this isn't acceptable for this data structure, because it means that objects will be created, and this causes garbage collections. The objects will be extremely short-lived, and modern JS engines have generational GCs, but it's important to have exactly no object created.

This is not enough to express all what's needed for this ring buffer:

We can make it so set(...) is used when possible.

When running the test suite (which is admittedly a bit contrived, with workloads that aren't that realistic), this fast-path can be taken in about 28% of the _copy(...) calls.

We can also add two new methods that always use set, and create DataView, and that can be used on the side of the ring buffer that doesn't need to be real-time, this might be useful.

padenot commented 1 year ago

If anybody can try this PR on a "real" workload and can report results, I'd be happy to have a look at the data.

FallingSnow commented 1 year ago

Ah that makes sense. While .subarray() is copy free it is not GC free, and I forgot about the GC requirements of this library.

I believe one way around this is to not make wrapping pushes that would require a 2 separate pushes onto the ringbuffer. Instead an artificial end pointer would be set and we would write to the beginning of the ringbuffer (given there is enough room). Then the artificial end pointer could be cleared after the read pointer passes it. This would create some "wasted" space at the end of the ringbuffer but given the 4x - 15x copy time improvement it may be worth if for some fantasy work load I can't think of right now lol.

ringbuffer-with-artificial-end-pointer

However this is most likely out of the scope of this library and I'm not sure if the complexity is worth it, but is interesting to think about.

Thank you for your "fast path" additions.