Closed annevk closed 9 years ago
I assume this was a convenience so that code didn't have to manually wrap the buffer in a view on each call. Just picking Uint8 does seem a bit opinionated.
It does not seem that limiting, though, given you can get Uint8Array.buffer
to convert to another view.
I guess I'd be happy to see ArrayBuffer passed, but I haven't felt strongly enough to raise an objection over it.
I think I sort of get the buffer reuse https://gist.github.com/domenic/dab9e8da245cb4fe6027 use case now. The view's buffer that's passed in gets detached, reused, and then reattached after getting the new bytes. That probably makes sense and is why adding one layer of abstraction here helps.
It would be nice if the reader could specify what type of view they want to use, though. Is that possible or is it really locked to Uint8Array?
It depends on the input view as I understand it which seems okay?
It depends on the input view as I understand it which seems okay?
For getByobReader().read()
it does, but for getReader().read()
just always gives you a Uint8Array I think.
For getByobReader().read() it does, but for getReader().read() just always gives you a Uint8Array I think.
Right, this. But I see no problem with adding an option to getReader
, at least in theory... details to be worked out...
I'll update https://github.com/whatwg/streams/issues/300 with this point
Right, this. But I see no problem with adding an option to getReader, at least in theory... details to be worked out...
It seems simpler to me to have getReader().read() give you an ArrayBuffer and then you get views if you opt into that with the read(view) approach.
Hmm well I don't feel strongly. We discussed this only briefly in https://github.com/whatwg/streams/issues/295#issuecomment-83548397... I think the consistency is nice in balance.
Note that for BYOB we have to either require a view (don't accept ArrayBuffer) or if we do accept an ArrayBuffer pick a default for the returned view.
Yea, making BYOB require a view seems reasonable to me.
I don't see how .read() returning a Uint8Array is consistent with .read(view), though. The .read(view) spec shouldn't explicitly mention Uint8Array at all, but rather the view passed in. So we're making up a type to be consistent with.
To me the consistency comes from ".read() returns the most general type possible without losing information".
Kind of a bikeshed, but it does seem like this could be quite an annoyance for anyone not using Uint8Array.
The only other argument is one I briefly alluded to in https://github.com/whatwg/streams/issues/295#issuecomment-83556911 which is that you could imagine an implementation that returns multiple views onto the same ArrayBuffer with subsequent calls to read(). E.g. it pre-allocates a large buffer, and then does some I/O that fills only a small subset of it, so it gives back a view onto that subset. Then the next I/O fills another subset, and so it gives back a view there. This only really works for single-threaded scenarios, and probably even then causes observable data races. But I thought I'd expand on it a bit.
I hope we can get @tyoshino and @yutakahirano to weigh in. If they agree then we'll need to patch Chrome ASAP.
The only other argument is one I briefly alluded to in whatwg/streams#295 (comment) which is that you could imagine an implementation that returns multiple views onto the same ArrayBuffer with subsequent calls to read(). E.g. it pre-allocates a large buffer, and then does some I/O that fills only a small subset of it, so it gives back a view onto that subset. Then the next I/O fills another subset, and so it gives back a view there. This only really works for single-threaded scenarios, and probably even then causes observable data races.
Does this really work given that Uint8Array.buffer allows content to access the underlying ArrayBuffer?
Personally, I expect to implement .read() for a binary stream to return a buffer sized to all the bytes currently available. I'm not sure why we would want to slice it up into smaller chunks arbitrarily.
Here is a contrived example:
Indeed you could see that view1.buffer === view2.buffer
. But that's implementable anyway. I could do that with a normal ReadableStream. Heck I don't even have to use array buffers: const x = {}; c.enqueue({ chunkNumber: 1, backingObject: x }); c.enqueue({ chunkNumber: 2, backingObject: x })
.
I can't come up with a non-contrived scenario where this is very sensible, though. You can try to make up stuff, like maybe it requires computation and you only want to do 1 MiB of computation at a time (but you have 10 MiB of memory at a time). But meh.
I think slicing like this only makes sense if it aligns with the consumer of the data. That .read(view) seems like the way to do that.
Or we could add a .read(numBytes) that tries to give you exactly that sized buffer. It could also wait to resolve until that many bytes is available.
So the mental model of buffer reuse is that you have an underlying buffer that represents 10 MiB. You keep passing the ArrayBuffer representing the underlying buffer around to indicate you want to reuse that same 10 MiB. And you have a view upon that ArrayBuffer that tells you what part of the ArrayBuffer you can inspect.
Both the ArrayBuffer and view instances are created fresh each read()
, right?
Wouldn't another possible model be that read()
always reuses the buffer? It keeps a pointer to the ArrayBuffer
it returns, which is the size of whatever was the maximum it could give. When read()
is invoked again that ArrayBuffer
gets detached. The streams implementation could preserve the same underlying buffer and just hand out slices through the ArrayBuffer
class. But I guess that's not flexible enough.
I don't have a strong opinion, but I want a strong consensus so that we will not reverse the decision again...
@wanderview https://github.com/yutakahirano/fetch-with-streams/issues/35#issuecomment-92462246: Even without .read(numBytes)
, it could make sense to have an ability to return views to the same big buffer if the underlying source generates chunks with large memory regions while the stream want to return a little more fine-grained chunks so that the reader can exert back-pressure more precisely. This is also kinda contrived example. If we really want to provide this kind of flexibility over back pressure exertion, we should have .read(numBytes)
as you said to accept the number of bytes to consume from the user explicitly (I also discussed this a little here https://github.com/whatwg/streams/issues/320#issuecomment-91606599).
So, I think if we add .read(numBytes)
, it should use ArrayBufferView, but for .read()
which is common for byte and non-byte stream, I think it's ok to use the ArrayBuffer.
@annevk
When read() is invoked again that ArrayBuffer gets detached
As you said, this is not so flexible. With this requirement, we need to ask the user not to touch the ArrayBufferViews returned for previous read() calls once it issues a new read() call.
@tyoshino in the alternative model I put forward (that's just exploration for what it's worth) read()
would always return an ArrayBuffer
. And whenever you invoke read()
again that would neuter it (to allow for reuse of the underlying buffer the stream controls). Developers would have to transfer bytes and add views.
Yeah. OK. I just meant that the following restriction is very different from what we have now.
And whenever you invoke read() again that would neuter it
To be clear, I'm fine with either ArrayBuffer or ArrayBufferView.
To be clear, I'm not advocating for .read(numBytes)
right now. That was a tangent. Sorry!
If .read()
returned ArrayBufferView, how would the calling code know what type of view it actually is to use it? Or would they have to use chunk.buffer
, chunk.byteOffset
, and chunk.length
to create their own view?
Sorry. By https://github.com/yutakahirano/fetch-with-streams/issues/35#issuecomment-92796493, I didn't mean that it should be unknown what kind of ArrayBufferView is read.
So, either of
I don't have a strong opinion, but I want a strong consensus so that we will not reverse the decision again...
Yes, this...
If .read() returned ArrayBufferView, how would the calling code know what type of view it actually is to use it? Or would they have to use chunk.buffer, chunk.byteOffset, and chunk.length to create their own view?
I don't really understand this question. For "how would the calling code know what type of view it actually is to use it" I think just looking at view.constructor
works. For "would they have to use chunk.buffer, chunk.byteOffset, and chunk.length to create their own view" I think they can just do new Float32Array(anyOtherView)
.
Ok, I did not realize the typed array constructors could convert another typed array that easily. That's the same as what they would need to do with an ArrayBuffer.
So I think Uint8Array as a default is a bit odd, but its not really a problem. I personally don't think its worth triggering a fire drill to try to patch something in the blink implementation.
@annevk what do you think?
@annevk that's an interesting model. A couple disadvantages I can think of:
I'm convinced about the existing model and with the Uint8Array
default given the arguments provided. Using ArrayBuffer
as output elsewhere (Fetch and Push API come to mind) still seems like a good idea as long as there are no reusing buffer concerns.
The Encoding Standard using Uint8Array
is odd in that respect but does set a precedent for using that as the default view for bytes.
I'll leave it to @wanderview to sign off and close this.
Still looks good with Uint8Array as default to me! I don't have perms to close issues in this repo, though.
In https://github.com/yutakahirano/fetch-with-streams/pull/34#issuecomment-92327392 I raised the question whether we should expose bytes as a buffer rather than a view. I believe that for the Encoding Standard we ended up concluding that we'd rather have exposed it as a buffer than a view as that would have been more consistent with other APIs, such as XMLHttpRequest and WebSocket.
Now streams are pretty close to the metal. Is there any reason they need a view?
@inexorabletash @wanderview