nodejs / performance

Node.js team focusing on performance
MIT License
371 stars 7 forks source link

Performance optimizations for web ReadableStream/WritableStream #134

Open jasnell opened 7 months ago

jasnell commented 7 months ago

I wanted to document a number of ideas for potential optimizations for the web streams ReadableStream/WritableStream that I likely won't have time in the near future to implement myself but would like to try to encourage other collaborators to potentially pick up. If someone does wish to pick these up, I'm more than happy to help with feedback and review (I'm just unlikely to have the time to actually help with the code part).

All ReadableStreams are backed by what is called an UnderlyingSource in the streams specification. The current implementation in Node.js follows the guidelines of the specification precisely currently and is implemented in such a way that it assumes the UnderlyingSource is itself always implemented as a JavaScript object. While this is fine for ReadableStreams created by user code in JavaScript, it leads to a range of performance bottlenecks when implementing a ReadableStream sourced, for instance, from a file on disk, a socket, or any internal-to-Node.js source.

For example, consider the fileHandle.readableWebStream(...) API. This creates a ReadableStream whose data is sourced from a C++ level StreamBase object, but most of the implementation exists in the form of JavaScript wrapper code in the newReadableStreamFromStreamBase(...) method in internal/webstreams/adapters.js. This adapter logic is not nearly as optimized as it could be. To understand why, imagine if we had a coresponding (currently hypothetical) fileHandle.writableWebStream(...) that used the same adapter pattern to write data out to a file and we want to pipe the readable to the writable:

const readable = await fsPromises.open('./abc.txt');
const writable = await fsPromises.open('./xyz.txt');

await readable.pipeTo(writable);

With the current design, the data is read at the c++ level chunks at a time, passed into JavaScript, subjected to the buffering inherent in the web streams spec, multiple promises, microtask continuations, etc, then passed a chunk at a time into the writable which has it's own buffering, promises, microtasks etc before being passed back out into c++ land for writing to the file. This despite the fact that we know the data is starting from c++ and going to c++ and we have no intention of actually interacting with the data at the javascript level. This is extremely wasteful.

Ideally, as an alternative, we could detect that the source and destination are both internally created streams backed by C++ level code, move the entire pipe to operation into C++ and have only a single promise at the JavaScript level representing the pipe operation.

Fundamentally, any ReadableStream or WritableStream that is backed by a source/destination that is inherently internal to Node.js (files, sockets, pipes, compression streams, etc) can benefit from alternative optimized implementations capable of bypassing a fair amount of the internal mechanisms defined by the webstreams spec as long as the outwardly observable behavior of the stream remains compliant with the specification. Since such streams would not be explicitly constructed by the user using the standard constructor, most of the implementation would not be observable by user code, and is therefore safe to optimize.

This does not mean eliminating the adapters.js pattern, since that is helpful for user code as a migration/interop path bridging Node.js streams with web streams. It would mean moving away from using the adapters for internally created web streams in favor of directly supporting more optimized internal implementations.

In the open source workerd runtime, we implement a model where a ReadableStream or WritableStream instance has one of two possible implementations (which we call Controllers). The standard controller implements the spec-defined behavior and is the implementation used for all streams that are created using the standard constructor (e.g. new ReadableStream({ ... }) and new WritableStream({ ... }). The internal controller implements an optimized path when the streams are created from an internal data source (e.g. request.body). This is just one possible implementation strategy. Our current implementation of these classes in Node.js currently assume the new ReadableStream({ ... }) and new WritableStream({ ... }) model. (Note also, the workerd implementation of streams is entirely in c++ currently, which ends up being way more complicated than the pure JS implementation due to complexities working with JS promises at the v8 C++ api level but that's a different story).

Fundamentally, I can imagine the following general optimizations for internally sourced streams:

  1. When piping from an internal readable to an internal writable, the piping of data from the readable to the writable should occur entirely at the c++ level, with only a single JavaScript promise exposed to the user (returned by the pipeTo/pipeThrough methods) representing the pipe operation.
  2. When reading from an internal readable, the majority of the spec-defined buffering currently implemented can be avoided in favor of a lazy, on-demand read from the underlying data source only when a read is requested.
  3. When writing to an internal writable with the current adapter approach, there tends to be a double-buffering effect where the WritableStream itself has a write buffer that is drained asynchronously in addition to the underlying StreamBase's own async handling of writes. The WritableStreams buffering is largely extraneous in this case.

What about TransformStream?

For internally sourced transforms, such as CompressionStream and DecompressionStream, similar optimizations are possible because such optimizations would not change any outwardly observable behavior. Our current implementations of these rely on adapting the Node.js compression stream implementations leading to a significant amount of overhead at runtime. Alternatively, we can achieve a significant performance improvement by implementing the compression transform at the c++ level and having the readable and writable for the transform be internal optimized implementations as opposed to adapter wrappers around the Node.js streams implementations.

Implementing these optimizations would be a significant effort that is likely best spread out over time and done incrementally. It might make sense to set up a strategic initiative to track the work. My goal here is to see if I can nerd snipe... er, inspire... others to jump in and implement these kinds of optimizations as opposed to continue to rely on the adapters. The web streams implementation performance will never be competitive with the Node.js streams alternatives unless we get these kinds of optimizations implemented.

jasnell commented 7 months ago

/cc @nodejs/streams @nodejs/whatwg-stream @nodejs/performance

jasnell commented 7 months ago

(Moved to the nodejs/performance repo from the main nodejs/node repo where it was originally posted)

jasnell commented 7 months ago

I will re-emphasize, this is not a trivial exercise.