What should we call ReadableByteStream.prototype.getBYOBReader()?

domenic commented 9 years ago

Per the lengthy discussions in #253 the new shape of ReadableByteStream will be something like:

getReader() returns a reader that, much like the "normal" ReadableStream, has automatically flowing auto-allocated chunks. So, rbs.getReader().read() will take no arguments and will auto-allocate a new Uint8Array each call.
Some other function, with straw-person name getBYOBReader(), returns a reader that requires you to feed it buffers. So, you call rbs.getBYOBReader().read(myBuffer) and it reads into myBuffer. (After doing appropriate detaching/transferring to ensure no observable data races, etc., as discussed extensively in #253 and elsewhere.)

Here "BYOB" stands for "bring your own buffer," by the way.

Ideas for names:

getBYOBReader()
getManualReader()
getBufferedReader()
getFedReader()
getManualFeedReader()
getControlledAllocationReader()
getReadIntoReader()
getBytesReader() (although the normal reader also gives you bytes)
getPullingReader() (referring to how generally no reading is done until you call read(view), i.e. data is "pulled")
getReader({ feedBuffers: true }) or similar (introduces some kinda-strange return type variance though)

Are there any precedents from other platforms we can draw on, just for vocabulary if nothing else?

yutakahirano commented 9 years ago

Naming after reader's type name looks natural to me.

domenic commented 9 years ago

Well, but do you have an idea for the reader's type name? :)

yutakahirano commented 9 years ago

[Readable]BytesStreamReader and getBytesReader, perhaps?

domenic commented 9 years ago

Yeah that kind of works, since it's about specialized behavior for byte streams. But then again, rbs.getReader().read() will return bytes too, hmm. I'll add it to the list :D

chrisdickinson commented 9 years ago

What about inverting it, such that getReader is the BYOB version, while the auto-allocating version becomes getAllocatingReader?

domenic commented 9 years ago

@chrisdickinson that's not as good, I think, because ideally getReader() on both ReadableStream and ReadableByteStream should return an auto-allocating readers with a no-argument read().

tyoshino commented 9 years ago

In offline discussion last Thursday, we categorized the readers we've been designing using a matrix.

Alloc / Pull	Auto	Manual
BYOB	-	A
Auto	B	C

(BYOB && Auto pull) doesn't make sense. So, it's marked -.

We're going to at least provide factory methods for A, B.

The "Alloc: auto" row also applies to object streams. Both of B and C make sense. I think C means that pulling one element.

For byte streams, I realized that I don't know what we should do for C well. C is manual pull. We're going to pull some bytes. How many? The user needs to call read() to pull bytes though they cannot specify the amount of data to pull. Domenic, what do you think about C? Is this kinda method to say "please read now, but decide how much to pull by yourself" to the reader?

So, if A, B and C make sense for byte streams, and B and C make sense to object streams, how should we organize the factory methods?

Byte stream
- getBYOBManualPullReader: A
- getAutoPullReader: B
- getManualPullReader: C
Object stream
- getAutoPullReader: B
- getManualPullReader: C

Gave verbose names for the first step.

domenic commented 9 years ago

Domenic, what do you think about C? Is this kinda method to say "please read now, but decide how much to pull by yourself" to the reader?

Yeah, exactly. This would probably be done intelligently on a per-stream basis. IIRC libuv has heuristics for this (since Node streams are not BYOB), @piscisaureus mentioned them to me once.

It also says, "I don't care about reusing buffers, and will let the JavaScript GC take care of freeing the memory".

I am starting to think that an options-object accepting getReader() may be better. In case we add more dimensions of variability in the future, it would avoid a combinatorial explosion in method names. Let us think what that would look like:

Byte stream
- getReader({ feedBuffers: true, flowing: true }): throws
- getReader({ flowing: true }): B
- getReader({ feedBuffers: true }): A
- getReader(): C
Object stream
- getReader({ flowing: true }): B
- getReader(): C

Or maybe we want the default to be auto-pulling, so instead of flowing: true it is manualPull: true. (Better name?) Then we have

Byte stream
- getReader({ feedBuffers: true, manualPull: true }): A
- getReader({ manualPull: true }): C
- getReader({ feedBuffers: true, manualPull: false }): throws
- getReader(): B
Object stream
- getReader({ manualPull: true }): C
- getReader(): B

The interesting thing about object streams B vs. C is that previously we kind of let this be controlled by the strategy instead of by the consumer. (Although even then I think we always would be trying to pull at least some before exerting backpressure.) We could move that all to configuration to getReader?

Wait, but how does C work for object streams that are wrapping push sources? I think the manualPull: true only makes sense for: (a) APIs that are inherently pull-based like read(2)-on-files; (b) APIs that allow us to use the kernel buffer to store data (but that only goes up to a limit before we start losing data...) like recv(2)-on-sockets. But it does not work for APIs like web sockets that are pushing data at you via events and you will lose the data if you don't react...

domenic commented 9 years ago

Upon further reflection I am not sure auto-pulling vs. manual-pulling is a property of the reader. It may be more of a property of the stream. If it is a property of the reader, you should be able to get an auto-pulling reader, then release it and get a manual-pulling one. Or vice versa. That seems strange and potentially hard to implement.

tyoshino commented 9 years ago

OK. Maybe "auto-pull" concept doesn't make sense when discussing a reader interface. What's going on behind a reader is unknown and abstracted. We basically shouldn't mention about anything behind it when naming the methods.

Pulling vs. non-pulling (give me when available) is the right way of comparison?

domenic commented 9 years ago

Tentative plan per afternoon discussions:

Use options object to allow greater future flexibility
Instead of "manualPull: false/true", use a queueSize argument, for both object and byte streams. If it is zero, then pulling is manual. (If feedBuffers is true, it must be zero.) If it is nonzero n, it communicates two things:
- The consumer would prefer that, on a best-effort basis, the stream have n bytes available to fulfill future .read() promises quickly. (This means for sources like files, the underlying source should pull enough bytes proactively to meet n.) If n is 10 MiB, then that might mean a single .read() promise gets fulfilled with a 10 MiB chunk, or the next 10 .read() promises get fulfilled with 1 MiB chunks, depending on which is more convenient for the stream implementer.
- The consumer would prefer not to consume much more memory than n. If the stream is getting close to or exceeding this threshold, then the stream should apply backpressure (or stop pulling proactively, for a case like files).
To make nonzero queueSize work, an underlying source must implement the pull()method---if that is not implemented, trying to pass queueSize will throw.
To make feedBuffer: true work, an underlying source must implement the read() method---if that is not implemented, trying to pass feedBuffers will throw.
Cases like a ReadableStream wrapping a WebSocket (i.e., wrapping an API that does not support backpressure signals) will not implement underlying source pull(), causing queueSize to never work. Instead a simple .getReader() call must be used, since we should not allow consumers to signal a desired queueSize if there is no way to control it.
Strategy will probably change. The most likely path looks to be getting rid of shouldApplyBackpressure and moving size() to the underlying source directly.
It would be beneficial to get the more precise backpressure APIs prototyped in #287, e.g. maybe enqueue() should return the size left in the queue or similar. We should work on this.
The implementation may get complicated in how we allow scenarios like
```
const reader = rbs.getReader({ queueSize: 1024 });

// later, after the queue has filled up 
reader.releaseLock();
const reader2 = rbs.getReader({ feedBuffers: true });
reader2.read(new Uint8Array(512));
```
This kind of exotic scenario will probably necessitate copying (at least if implemented in JavaScript; UAs might be able to get away with better tricks for their internal queues).

domenic commented 9 years ago

@tyoshino I am coming back to this issue several months later and it seems we kind of forgot about a lot of the ideas in https://github.com/whatwg/streams/issues/294#issuecomment-81557090. Given how our thinking has evolved, e.g. in #329 and #361, what are your thoughts on that comment?

tyoshino commented 9 years ago

Thanks for reminder, Domenic. I'd like to begin with responding to https://github.com/whatwg/streams/issues/294#issuecomment-81557090

Use options

This is still true, I think. Less methods sounds better basically. E.g. if we decide to implement various readers as I listed in https://github.com/whatwg/streams/issues/329#issuecomment-106780286, the option takes mode: "borrowing", etc.

manualPull: false/true

I'm gonna investigate if I can integrate the strategy mechanism with the up-to-date RBS (has pull/pullInto) well.

Cases like a ReadableStream wrapping a WebSocket ...

BYOB of the up-to-date RBS should just work (though involves copy).

domenic commented 9 years ago

I have spun off a lot of that stuff into https://github.com/whatwg/streams/issues/375. That leaves this thread for its original purpose, bikeshedding the name :).

So you think it should be

rbs.getReader({ mode: "byob" });
rbs.getReader({ mode: "default" }); // or... "pull"?
rbs.getReader(); // equivalent to default

The other alternative we were thinking was

rbs.getReader({ feedBuffers: true });
rbs.getReader({ feedBuffers: false });
rbs.getReader(); // equivalent to false

but I think that you are right that future expansion is better served by a string enum.

tyoshino commented 9 years ago

Ah, good point. Yeah, explicit parameters (boolean) for each feature is cleaner, but with mode string we can list only meaningful combinations and maybe robust for future extensions.

tyoshino commented 8 years ago

ReadableByteStream has been merged into ReadableStream (https://github.com/whatwg/streams/issues/379). Now ReadableStream has both getReader and getBYOBReader().

It's time to resume discussion on this issue and finalize the BYOB API surface.

domenic commented 8 years ago

I think your arguments for getReader() + getReader({ mode: "byob" }) are still pretty good. Should we move to that?

tyoshino commented 8 years ago

Reviewed the discussion in this thread. Yes, I still like the plan to add options to getReader().

Regarding the name, I think "buffer-provided" and "buffer-fed" are the candidates I'm still considering in addition to "byob".

mode: "default"
mode: "byob"
- ReadableStreamBYOBReader
mode: "buffer-fed"
- ReadableStreamBufferFedReader
mode: "buffer-provided"
- ReadableStreamBufferProvidedReader

BTW, the differentiation between "auto pull" and "manual pull" discussed around https://github.com/whatwg/streams/issues/294#issuecomment-81428113 has been well depicted in the new examples written by Domenic.

https://github.com/whatwg/streams/pull/430#discussion_r57444278 is also related.

In the latest spec, we chose to have the controllers expose queuing status via the desiredSize() getter to delegate pulling decision to the underlying source. Auto/manual pull is no longer part of the readable stream (and controller / reader) API but characteristics of underlying sources.

tyoshino commented 8 years ago

Now I feel that ReadableStreamBYOBController is not the best naming for the controller with the BYOB feature. It has facilities for BYOB reading, but can also work with the default reader. Its main and essential characteristic is that it handles bytes (expects bytes, copies and repackages bytes).

So, I propose:

rename ReadableStreamBYOBController to ReadableStreamBytesController (similar to one of the candidates in https://github.com/whatwg/streams/pull/418#issuecomment-171822826)
renaming the "byob" parameter of the ReadableStream constructor to (source|controller)?(Type|Mode)
- let the default value to be "default",
- when its value is "bytes",

domenic commented 8 years ago

Hmm. I still like keeping BYOB in those names, because BYOB is the new capability on top of the default. I think it would be OK to keep it. I think of the byob: true option to the underlying source to be something like byobCapable: true (just shorter). I agree that its essential characteristic is that it handles bytes, but it's main new capability is a BYOB API.

Among byob, buffer-fed, and buffer-provided I think byob is pretty good. Maybe it is just because we have been staring at that for a few months though. I think buffer-fed is better than buffer-provided probably.

tyoshino commented 8 years ago

Hmm. I still like keeping BYOB in those names, because BYOB is the new capability on top of the default. ...

In addition to https://github.com/whatwg/streams/issues/294#issuecomment-202748216, I'm thinking about the case in which we need to introduce the read() with ack functionality. If we introduce the feature, we would need to also add a new interface to the controller. Then, the controller will gain one more new feature in addition to BYOB. Considering such possibility, I was leaning toward giving more general name to it. Of course, we could just add a new controller. WDYT?

Among byob, buffer-fed, and buffer-provided I think byob is pretty good. Maybe it is just because we have been staring at that for a few months though. I think buffer-fed is better than buffer-provided probably.

OK. Let's go in "byob".

domenic commented 8 years ago

If we introduce the feature, we would need to also add a new interface to the controller. Then, the controller will gain one more new feature in addition to BYOB. Considering such possibility, I was leaning toward giving more general name to it. Of course, we could just add a new controller. WDYT?

Hmm, it is hard to say. Whether we introduce a new controller or not would depend on how different the internal state tracking would be.

I guess I see two options:

Go with BYOB, and maybe in the future change if we decide that the same controller needs more responsibilities, we can probably rename it since it is not very web-exposed
Avoid any possibility of future conflicts by just making the change to ReadableStreamBytesController (ReadableByteStreamController is probably better?) now.

I am OK with either.

So the total changes being contemplated here are:

Merge getBYOBReader into getReader using { mode: "byob" } as an argument
(Maybe) rename ReadableStreamBYOBController to ReadableStreamBytesController or ReadableByteStreamController

But no renaming for ReadableStreamBYOBReader or ReadableStreamBYOBRequest or for the underlying source option (byob: true). Right?

tyoshino commented 8 years ago

Thanks.

... depend on how different the internal state tracking would be.

Yeah. I can't come up with estimation of the complexity it would add for now.

... it since it is not very web-exposed ...

I see.

Merge getBYOBReader into getReader using { mode: "byob" } as an argument

Yes!

(Maybe) rename ReadableStreamBYOBController to ReadableStreamBytesController or ReadableByteStreamController But no renaming for ReadableStreamBYOBReader or ReadableStreamBYOBRequest or for the underlying source option (byob: true). Right?

I thought that if we rename ReadableStreamBYOBController, the underlying source option should also be renamed as it's an option telling what type of controller it needs or what set of reading operations the stream should expose.

But yeah, we could just add a new parameter say "bufferLeasing: true" in addition to byob.

So, ok.

domenic commented 8 years ago

I thought that if we rename ReadableStreamBYOBController, the underlying source option should also be renamed as it's an option telling what type of controller it needs or what set of reading operations the stream should expose.

Hmm, I see. I think the real question is: do we expect a 1:1:1 underlying source types <-> controller types <-> reader types. I guess we should not necessarily expect that. Right now we have 1:1 underlying source types <-> controller types, and that will probably stay forever. But the correspondence with reader types is already not 1:1 since you can get both default and byob readers from (what is currently called) a ReadableStreamBYOBController.

So, OK. We should reflect this asymmetry in the API. New plan:

Merge getBYOBReader into getReader using { mode: "byob" } as an argument
Rename ReadableStreamBYOBController to ReadableByteStreamController
Change the byob: true switch to type: "bytes"

(Maybe there is a better word than type, but I thought it was important to have different terms, so we could say that a given type of controller/stream/underlying source can have different modes for reading from it.)

tyoshino commented 8 years ago

Yeah, in the last comment of mine, I thought we could have the ReadableStream constructor parameters each corresponds to enabled getReader() mode, but I prefer your idea. Yes, 1:1 correspondence between underlying source types and controller types should be stable for long time.

Please review the PR.

whatwg / streams

What should we call ReadableByteStream.prototype.getBYOBReader()? #294