nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
105.63k stars 28.65k forks source link

Prevent double buffering when converting Readable to ReadableStream #48636

Open hbgl opened 1 year ago

hbgl commented 1 year ago

Version

v18.16.0

Platform

Microsoft Windows NT 10.0.19045.0 x64

Subsystem

webstreams

What steps will reproduce the bug?

When creating a ReadableStream from a Readable via Readable.toWeb, the total amount of internally buffered data is twice as much as expected.

To reproduce, run this program:

// File index.mjs
import { Readable } from 'node:stream';

let i = 1;
function* generator() {
    for (;;) {
        yield i;
        i++;
    }
}

const readable = Readable.from(generator(), { objectMode: true, highWaterMark: 8 });
const stream = Readable.toWeb(readable);
const reader = stream.getReader();

await reader.read();

// Prints: Pulled 17 items.
console.log(`Pulled ${i} items`);

How often does it reproduce? Is there a required condition?

There is no special condition.

What is the expected behavior? Why is that the expected behavior?

The expected behavior is to only have a single internal buffer.

What do you see instead?

Both the Readable and the attached ReadableStream buffer data internally.

Additional information

I believe the issue is that the highWaterMark from the node Readable is copied to the ReadableStream queuing strategy. Since the two streams each have a buffer, the amount of buffered data is effectively doubled. This is the code in question:

https://github.com/nodejs/node/blob/1936160c31afc9780e4365de033789f39b7cbc0c/lib/internal/webstreams/adapters.js#L409C1-L427

My suggested fix would be to disable buffering in the ReadableStream by setting its highWaterMark to zero.

kanongil commented 1 year ago

This works as the docs say when no highWaterMark option is provided to Readable.toWeb(). If you want to carefully manage your buffer levels, you will need to specify it manually.

A better default might be the ReadableStream default of 1. This is creating a new stream, and for many cases it makes sense to buffer at least 1 entry ahead (allowing the next value to be fetched, while the current is being processed).

The value copying doesn't really work for any use-cases, and will cause unexpected surprises (like for the author of this issue).

The API is marked experimental, so I guess the default can still be changed.

hbgl commented 1 year ago

A better default might be the ReadableStream default of 1. This is creating a new stream, and for many cases it makes sense to buffer at least 1 entry ahead (allowing the next value to be fetched, while the current is being processed).

I think a case can be made for both having the default highWaterMark of the new ReadableStream be 0 or 1.

a) If you set it to 1, then the underlying buffer will be eagerly filled upon stream creation. This is the default for ReadableStreams. b) If you set it to 0, then the underlying buffer will only fill up lazily upon first read from the stream. This is the default for Readables. You can call readable.read(0) to start the buffering process early.

Either case will work fine but I it might be worth adding a note in the documentation.