nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.67k stars 29.1k forks source link

stream.readable._read() description is unclear #42291

Open kaysond opened 2 years ago

kaysond commented 2 years ago

Affected URL(s)

https://nodejs.org/api/stream.html#readable_readsize

Description of the problem

I'm implementing a readable stream, and found the following part of the documentation to be unclear:

When readable._read() is called, if data is available from the resource, the implementation should begin pushing that data into the read queue using the this.push(dataChunk) method. _read() will be called again after each call to this.push(dataChunk) once the stream is ready to accept more data. _read() may continue reading from the resource and pushing data until readable.push() returns false. Only when _read() is called again after it has stopped should it resume pushing additional data into the queue.

Once the readable._read() method has been called, it will not be called again until more data is pushed through the readable.push() method. Empty data such as empty buffers and strings will not cause readable._read() to be called.

Specifically, these three quotes seem to conflict one another, and the relationship/behavior of _read() and push() are not very obvious to me:

  1. _read() will be called again after each call to this.push(dataChunk) once the stream is ready to accept more data
  2. Once the readable._read() method has been called, it will not be called again until more data is pushed through the readable.push()
  3. data. _read() may continue reading from the resource and pushing data until readable.push() returns false

The first seems to claim that ._read() gets called after every call of .push(), but then also claims that its conditional on the stream consumer's readiness to accept more data. The second seems to claim that after the first _read(), it won't get called again until there is at least one .push(). The third seems to claim that ._read()'s implementation can call .push() multiple times, so presumably, the first can't be true or you'd end up with an infinite loop of sorts.


The examples under stream.readable._construct() and stream.readable.push() are also confusing in that they suggest two very different sorts of operations of ._read() and .push().

The former suggests that initiating stream consumption calls the first ._read(), which in the example calls a single .push(), which then triggers the stream to call ._read() once again, rinse and repeat.

The latter seems to suggest that initiating stream consumption calls the first ._read(), which triggers continuous and asynchronous calls to .push() which only stop when it returns false. Another subsequent ._read() call would then restart the .push()es

I'm guessing that in this second example, ._read() is actually called multiple times, but this._source.readStart(); is assumed to be idempotent and can be called multiple times with no effect. If that's the case, I think it would be helpful to clarify this example, and also be a little more explicit in the ._read() description.


Maybe something like (assuming my understanding is correct):

When readable._read() is called, if data is available from the resource, the implementation should begin pushing that data into the read queue using the this.push(dataChunk) method. The stream implementation may continue reading from the resource and pushing data until readable.push() returns false.

After the first call, readable._read() will only be called again after data is pushed through the readable.push() method, and will be called with every push(). Empty data such as empty buffers and strings will not cause readable._read() to be called again.

When readable.push() returns false, the stream consumer is not ready to accept more data, so the stream implementation should stop calling .push(). Only after _read() is called again should the stream resume pushing additional data into the queue.

Thanks!

kaysond commented 2 years ago

Ok I've finally had some time to experiment and find out how exactly this works. To be honest, though I now have a pretty good understanding of how things behave, I'm even more confused about why.

It appears that if in a given _read() call, you call push() multiple times, _read() does not get called again until sometime after you stop push()-ing. This even if _read() exits! (by using fs.read() with a callback, for example). If I defer the .push() call with setTimeout(), though, I end up getting tons of _read() calls.

It's not exactly clear when and why _read() gets called again! Is it based on a timer? Number of ticks? Or is it queue based? Does the stream wait for the I/O callback queue to empty but not the timer queue?