ReadableByteStream: should support an internal queue

domenic commented 9 years ago

The current draft does not include such a queue; there is a one-to-one map of calls to rbsReader.read() and rbsUnderlyingSource.pull(). I think this is potentially problematic, for two reasons:

Kernel buffers can become full, if I understand correctly. Thus, the code for rbsUnderlyingSource.pull() will not be a simple call to read(2) or similar. Instead, the underlying source author will need to do their own buffering. That seems bad.
We want readable byte streams to be resistant to uneven flow: e.g., if the consumer slows down for a bit, then gets much faster, then slows down for a bit, it would be better to have some chunks read ahead of time to so that we can match their pace (up to a high water mark).

This does make things more complicated though for the BYOB case, as we have to figure out what happens if the internal queue has things in it and then we switch to a BYOB reader. This was discussed at https://github.com/whatwg/streams/issues/177#issuecomment-83984473

@tyoshino, have your thoughts changed since that discussion? Or is the current draft lacking a queue just as a first-draft thing?

tyoshino commented 9 years ago

The boundary between the underlying byte source and the readable byte stream doesn't need to correspond to any specific layer boundary (e.g. kernel-user). It just depends on the philosophy the author of the spec gave to the spec. When we want to drain some data to one world (storage, kernel, process, C++) from another (another storage, userland, another process, JS heap), they're always allowed to do that in various means.

Or is the current draft lacking a queue just as a first-draft thing?

Right. Having some strategy-controlled buffer example in the spec is good. But the spec shouldn't impress the readers that they must implement the buffer in JavaScript. My suggestion is having two separate classes. One is just describing all the publicly visible requirements (including invariants, interaction between methods, properties, etc.), and the other is a strategy-controlled buffer which is convenient for push source implementor. The buffer class is good example for explaining how to interpret backpressure and controlling buffer size, data generation in general. That can be referred to when writing non-JavaScript buffering components. And of course, it's also useful as a built-in JS library.

This separation doesn't make much difference in ReadableStream, but I really want this separation for ReadableByteStream. If the source is push source, then it's convenient for the implementor of the source if the stream has a built-in buffer. In that case, the source doesn't understand BYOB style pulling, so we need to take care of the gap by copying the contents to the view provided by the consumer. But this is not expected for a source that understands BYOB. For BYOB read requests, the source just wants to get the view and fulfill the corresponding pending request with the view (possibly detached) filled with generated contents as-is. There shouldn't any automatic copy. Having a queue even for BYOB-capable source may be useful. But the queue system for BYOB-capable source and one for push source would be very different.

I try to finish prototyping the ideas explained above. WIP branch is at https://github.com/whatwg/streams/blob/bytestream/index.bs.

tyoshino commented 9 years ago

I've made much edit on the last post. Sorry but please take a look at the latest one.

domenic commented 9 years ago

My suggestion is having two separate classes.

I don't think this is the way to go. We should not have a class X whose internals are specified, based on interaction with a strategy, and class Y, whose internals are not specified but whose invariants are. Instead we should just have one class, X, which is flexible enough that any Y can be expressed by chosing a specific strategy. And, if you can come up with such a strategy, then of course you're free as an implementer to implement it without using strategies at all, as long as you behave exactly the same. But it's still just one class and one spec.

Maybe it would help if we added, somewhere prominent, a note like:

Note for implementers: this spec describes stream classes controlled by underlying source or underlying sink JavaScript objects. This allows developers the flexibility to create streams with a wide variety of behaviors, while ensuring that their internal and external invariants are all maintained. Platform-created streams of these classes should behave as if they were also created from some conceivable JavaScript underlying source or sink object, in order to maintain those invariants. However, since such underlying source and sink objects are not externally visible once a stream has been constructed, an implementation is free to use any mechanism it wants to implement the resulting behavior; they don't literally need to create such objects.

For BYOB read requests, the source just wants to get the view and fulfill the corresponding pending request with the view (possibly detached) filled with generated contents as-is.

I don't think this is necessarily true. For example if you were writing a source that was directly using recv(2), this would not be good enough, because of kernel buffer overflow. You would want a queue to take care of these cases. In particular, you would want to use the pull -> controller.respond() flow in most cases, but if the kernel buffer is getting flow you would want to use controller.enqueue().

However, I think it might be OK if we say that this is rare enough that we force authors (and implementers) to handle it on their own. That is, if we assume nobody will ever expose read(2) directly to JavaScript (see my blog post), but instead will expose something that shields them from kernel buffer overflows, then maybe it is OK to not build in a queue.

But that means there will still be uneven flow, even for getReader(), which is kind of sad.

I try to finish prototyping the ideas explained above. WIP branch is at https://github.com/whatwg/streams/blob/bytestream/index.bs.

Having typed all the above, I then went to skim your changes and I see that you actually implemented the queue! And allowing both types of readers to be used, with interop on both sides! Wow!! This is very convenient, if you are OK with it. Now I am a little confused :)

I will need to dig in to your branch more later to understand everything. (Now I know what you must feel like reviewing my big patch sets...)

tyoshino commented 9 years ago

Maybe it would help if we added, somewhere prominent, a note like:

Yes. I understand it helps. But I'm feeling that the all-in-one class approach would make things complicated.

I don't think this is necessarily true. ...

It's fine to have a queue for convenience. But

However, I think it might be OK if we say that this is rare enough ...

I was thinking as you described here.

https://github.com/whatwg/streams/blob/bytestream/index.bs

Right. I was trying to build an all-in-one class in that branch. Sorry for confusing. I was trying but seeing difficulties in designing the class without much complexity.

Now it's almost done.

Tee is not updated, yet.
"pull" semantics is the same as ReadableByteStream if the underlying byte source doesn't have "pullInto" on it. If the underlying byte source has "pullInto" on it, "pull" semantics will be different from one of ReadableByteStream.
- Once the underlying byte source receives "pullInto" (i.e. there was no view queued in [[queue]]), it must respond to it by calling controller.respond(). Calling controller.enqueue() fails.
- Otherwise, i.e. received only "pull" or there's no pending read, the underlying byte source may call controller.enqueue().
- When a new read (either BYOB or non-BYOB) is made on a stream with non-empty [[queue]], the reads would be fulfilled by consuming bytes in [[queue]].
When non-1byte-element-size TypedArray is provided to a BYOB reader, it doesn't fulfill until at least one element is filled.
If controller.close() is called with a partially filled non-1byte-element-size TypedArray left in [[pendingViews]], the stream gets errored.

domenic commented 9 years ago

Right. I was trying to build an all-in-one class in that branch. Sorry for confusing. I was trying but seeing difficulties in designing the class without much complexity.

What is your general feeling? How much more complex is it, and would it be worth it? I am not sure anymore. Maybe the extra layer could be added future-compatibly?

If controller.close() is called with a partially filled non-1byte-element-size TypedArray left in [[pendingViews]], the stream gets errored.

Hmm, why? Maybe I don't fully understand what "partially filled" means?

Doing some review of https://streams.spec.whatwg.org/branch-snapshots/bytestream/ (on the assumption we would want to go with this, even if that is not guaranteed pending more discussion above).

stream@[[sourceSupportsByob]]

Maybe we should not allow ReadableByteStreams that don't support BYOB? It would help uncomplicate things, and would prevent the situation of giving a "false contract" where you support getByobReader() but actually it is just an inefficient shim over the normal reader.

GetViewedArrayBuffer / GetByteLength / GetByteOffset

Filed https://bugs.ecmascript.org/show_bug.cgi?id=4369 for you.

this@[[filledBytesOfPendingViewsHead]] is not equal to 0, throw a RangeError exception.

Probably TypeError; there isn't a developer-supplied number that is out of range here.

Reject readRequestPromise with CreateIterResultObject(stream@[[storedError]], chunk).

IMO it is OK to throw away the chunk if the stream has errored. It is better than rejecting with a non-Error. So I would just reject with stream@[[storedError]].

PullFromReadableByteStreamInto

This would be a bit clearer if 2.c was nested under 2.b IMO.

RespondToReadableByteStreamByobReaderReadIntoRequest

I think it needs to do transfers before giving back chunk

If reader@[[readIntoRequests]] is empty, DetachReadableByteStreamReader(stream).

Why? Doesn't this lead to situations where you call .read(view) once and then all of a sudden your reader is no longer active? I am probably missing something.

tyoshino commented 9 years ago

If controller.close() is called with a partially filled non-1byte-element-size TypedArray left in [[pendingViews]], the stream gets errored.

Hmm, why? Maybe I don't fully understand what "partially filled" means?

Please suppose that the BYOB interface of a stream is given a Uint16Array with 1 element (i.e. backed with 2 byte long ArrayBuffer), where the stream is EOF-ed with only 1 byte of data. We planned to return an instance of the same ArrayBufferView variant, so, we want to return a Uint16Array. The stream has only 1 byte to return to the user, but we cannot create a Uint16Array and make it indicate that only 1 byte valid data has been saved into it (1-element Uint16Array is 2 byte long).

tyoshino commented 9 years ago

stream@[[sourceSupportsByob]]

Maybe we should not allow ReadableByteStreams that don't support BYOB? It would help uncomplicate things, and would prevent the situation of giving a "false contract" where you support getByobReader() but actually it is just an inefficient shim over the normal reader.

I guess you meant s/ReadableByteStream/building ReadableByteStream with an underlying source that doesn't support BYOB/. Maybe you're right. I'll try it out.

tyoshino commented 9 years ago

this@[[filledBytesOfPendingViewsHead]] is not equal to 0, throw a RangeError exception.

Probably TypeError; there isn't a developer-supplied number that is out of range here.

OK. Will fix.

tyoshino commented 9 years ago

Reject readRequestPromise with CreateIterResultObject(stream@[[storedError]], chunk).

IMO it is OK to throw away the chunk if the stream has errored. It is better than rejecting with a non-Error. So I would just reject with stream@[[storedError]].

Oh, sorry. I didn't come up with what to do with this, and so I left it incomplete (passing error and chunk to CreateIterResultObject) ... and forgot to revisit.

OK. Will fix.

domenic commented 9 years ago

Please suppose that the BYOB interface of a stream is given a Uint16Array with 1 element (i.e. backed with 2 byte long ArrayBuffer)

I see, that makes sense. So to be clear:

Uint16Array byteLength = 2, 1 byte filled => error (your case)
Uint16Array byteLength = 20, 2 bytes filled => not an error?
Uint16Array byteLength = 20, 3 bytes filled => error?

I cannot really tell whether or not this should be an error (or if the consumer should just deal with the fact they didn't get enough bytes) so am happy to go with your judgement.

tyoshino commented 9 years ago

If we could return the number of bytes filled together with the ArrayBufferView, we can avoid erroring it. But, maybe, we should rather ask those who want to fill a Uint16Array from unsigned int16 data sharded into multiple streams to just change the view type to Uint8Array, and change back to Uint16Array once done.

tyoshino commented 9 years ago

PullFromReadableByteStreamInto

This would be a bit clearer if 2.c was nested under 2.b IMO.

Right. It's a bug. Will fix.

RespondToReadableByteStreamByobReaderReadIntoRequest

I think it needs to do transfers before giving back chunk

Yeah, but should we automatically transfer it in library code than asking underlying source implementors to do it by themselves if necessary?

If reader@[[readIntoRequests]] is empty, DetachReadableByteStreamReader(stream).

Why? Doesn't this lead to situations where you call .read(view) once and then all of a sudden your reader is no longer active? I am probably missing something.

Sorry I cannot get what situation you're concerned with.

Both in ReleaseReadableByteStreamReader and RespondToReadableByteStreamByobReaderReadIntoRequest, this step is invoked only when the stream is in "closed" or "errored" state. read(view) and read() issued after after detaching will be taken care of by the reader object.

tyoshino commented 9 years ago

Pushed WIP reference implementation.

domenic commented 9 years ago

Yeah, but should we automatically transfer it in library code than asking underlying source implementors to do it by themselves if necessary?

I think it has to be part of the implementation, as otherwise it is possible to create observable data races.

Sorry I cannot get what situation you're concerned with.

Both in ReleaseReadableByteStreamReader and RespondToReadableByteStreamByobReaderReadIntoRequest, this step is invoked only when the stream is in "closed" or "errored" state

Oh sorry, I missed that. All good.

Pushed WIP reference implementation.

Yay!!

tyoshino commented 9 years ago

stream@[[sourceSupportsByob]]

Maybe we should not allow ReadableByteStreams that don't support BYOB? It would help uncomplicate things, and would prevent the situation of giving a "false contract" where you support getByobReader() but actually it is just an inefficient shim over the normal reader.

I guess you meant s/ReadableByteStream/building ReadableByteStream with an underlying source that doesn't support BYOB/. Maybe you're right. I'll try it out.

I remembered why I chose to have sourceSupportsByob.

I thought that it's good if we could make it able to wrap a source that doesn't understand read(view) with a ReadableByteStream to add BYOB reading functionality.

If we try to realize this without checking whether the source understands read(view) in advance:

we issue pullInto(view) to the source
the source, maybe, just ignores it and issues controller.enqueue()

To prepare for this, the stream needs to store the ArrayBufferViews inside itself and fill them on controller.enqueue() call. I wanted to avoid this queue which is unnecessary for sources that understand read(view).

If we can require the source to understand read(view) and respond to it with controller.respond(), things will be simpler, but it'll be impossible to implement a source for the ReadableByteStream only with controller.enqueue().

domenic commented 9 years ago

Ah yes, that is a reasonable thing to do.

In balance though, I think it is better to require the source to understand read(view) and use controller.respond(). You can indeed imagine cases where people want ReadableByteStreams that have a "fake BYOB" interface, that necessitates copying. But I think it is OK for that to be hard to implement, instead of easy. That is, the underlying source code would end up pretty complicated and ugly---it would need its own internal queue and it would copy from that queue into the supplied view inside read(view). You can still do it but it's not easy.

In other words, I think if someone wants to create a ReadableByteStream with fake BYOB we should make them do that by writing a complicated underlying source, instead of take on that burden ourselves by complicating the ReadableByteStream machinery.

domenic commented 9 years ago

Wait, upon re-reading I realize my argument may have been overzealous. controller.enqueue() is necessary---at least, that is the premise of this thread. But the intended usage is that you implement read(view) and then also use controller.enqueue() when your kernel buffer is overflowing (or similar). So I meant to argue against sourceSupportsByob and not against controller.enqueue().

(That said, the result of #295 might be that supporting both models is too hard and we have to remove controller.enqueue(). But in this thread we assume controller.enqueue() must stay and are discussing details like sourceSupportsByob.)

tyoshino commented 9 years ago

Thanks for the suggestion. But it seems we might be able to support both.

Please take a look at the quick progress summary at https://github.com/whatwg/streams/issues/295#issuecomment-111044231

Now, we allow an underlying byte source to be implemented with only enqueue() calls. Just respond() must not be called when there's no pending pullInto.

tyoshino commented 8 years ago

All the ideas discussed in this issue have been incorporated into the PR https://github.com/whatwg/streams/pull/418.

whatwg / streams

ReadableByteStream: should support an internal queue #353