View versus buffer - Githubissues

annevk commented 9 years ago

In https://github.com/yutakahirano/fetch-with-streams/pull/34#issuecomment-92327392 I raised the question whether we should expose bytes as a buffer rather than a view. I believe that for the Encoding Standard we ended up concluding that we'd rather have exposed it as a buffer than a view as that would have been more consistent with other APIs, such as XMLHttpRequest and WebSocket.

Now streams are pretty close to the metal. Is there any reason they need a view?

@inexorabletash @wanderview

wanderview commented 9 years ago

I assume this was a convenience so that code didn't have to manually wrap the buffer in a view on each call. Just picking Uint8 does seem a bit opinionated.

It does not seem that limiting, though, given you can get Uint8Array.buffer to convert to another view.

I guess I'd be happy to see ArrayBuffer passed, but I haven't felt strongly enough to raise an objection over it.

yutakahirano commented 9 years ago

https://github.com/whatwg/streams/issues/295

annevk commented 9 years ago

I think I sort of get the buffer reuse https://gist.github.com/domenic/dab9e8da245cb4fe6027 use case now. The view's buffer that's passed in gets detached, reused, and then reattached after getting the new bytes. That probably makes sense and is why adding one layer of abstraction here helps.

wanderview commented 9 years ago

It would be nice if the reader could specify what type of view they want to use, though. Is that possible or is it really locked to Uint8Array?

annevk commented 9 years ago

It depends on the input view as I understand it which seems okay?

wanderview commented 9 years ago

It depends on the input view as I understand it which seems okay?

For getByobReader().read() it does, but for getReader().read() just always gives you a Uint8Array I think.

domenic commented 9 years ago

For getByobReader().read() it does, but for getReader().read() just always gives you a Uint8Array I think.

Right, this. But I see no problem with adding an option to getReader, at least in theory... details to be worked out...

domenic commented 9 years ago

I'll update https://github.com/whatwg/streams/issues/300 with this point

wanderview commented 9 years ago

Right, this. But I see no problem with adding an option to getReader, at least in theory... details to be worked out...

It seems simpler to me to have getReader().read() give you an ArrayBuffer and then you get views if you opt into that with the read(view) approach.

domenic commented 9 years ago

Hmm well I don't feel strongly. We discussed this only briefly in https://github.com/whatwg/streams/issues/295#issuecomment-83548397... I think the consistency is nice in balance.

Note that for BYOB we have to either require a view (don't accept ArrayBuffer) or if we do accept an ArrayBuffer pick a default for the returned view.

wanderview commented 9 years ago

Yea, making BYOB require a view seems reasonable to me.

I don't see how .read() returning a Uint8Array is consistent with .read(view), though. The .read(view) spec shouldn't explicitly mention Uint8Array at all, but rather the view passed in. So we're making up a type to be consistent with.

To me the consistency comes from ".read() returns the most general type possible without losing information".

Kind of a bikeshed, but it does seem like this could be quite an annoyance for anyone not using Uint8Array.

domenic commented 9 years ago

The only other argument is one I briefly alluded to in https://github.com/whatwg/streams/issues/295#issuecomment-83556911 which is that you could imagine an implementation that returns multiple views onto the same ArrayBuffer with subsequent calls to read(). E.g. it pre-allocates a large buffer, and then does some I/O that fills only a small subset of it, so it gives back a view onto that subset. Then the next I/O fills another subset, and so it gives back a view there. This only really works for single-threaded scenarios, and probably even then causes observable data races. But I thought I'd expand on it a bit.

I hope we can get @tyoshino and @yutakahirano to weigh in. If they agree then we'll need to patch Chrome ASAP.

wanderview commented 9 years ago

The only other argument is one I briefly alluded to in whatwg/streams#295 (comment) which is that you could imagine an implementation that returns multiple views onto the same ArrayBuffer with subsequent calls to read(). E.g. it pre-allocates a large buffer, and then does some I/O that fills only a small subset of it, so it gives back a view onto that subset. Then the next I/O fills another subset, and so it gives back a view there. This only really works for single-threaded scenarios, and probably even then causes observable data races.

Does this really work given that Uint8Array.buffer allows content to access the underlying ArrayBuffer?

Personally, I expect to implement .read() for a binary stream to return a buffer sized to all the bytes currently available. I'm not sure why we would want to slice it up into smaller chunks arbitrarily.

domenic commented 9 years ago

Here is a contrived example:

Underlying source is implemented in JS and on the same thread
Underlying source pre-allocates a 10 MiB buffer
For each call to read(), the underlying source decides to dole out 1 MiB of the buffer via a view onto it.

Indeed you could see that view1.buffer === view2.buffer. But that's implementable anyway. I could do that with a normal ReadableStream. Heck I don't even have to use array buffers: const x = {}; c.enqueue({ chunkNumber: 1, backingObject: x }); c.enqueue({ chunkNumber: 2, backingObject: x }).

I can't come up with a non-contrived scenario where this is very sensible, though. You can try to make up stuff, like maybe it requires computation and you only want to do 1 MiB of computation at a time (but you have 10 MiB of memory at a time). But meh.

wanderview commented 9 years ago

I think slicing like this only makes sense if it aligns with the consumer of the data. That .read(view) seems like the way to do that.

Or we could add a .read(numBytes) that tries to give you exactly that sized buffer. It could also wait to resolve until that many bytes is available.

annevk commented 9 years ago

So the mental model of buffer reuse is that you have an underlying buffer that represents 10 MiB. You keep passing the ArrayBuffer representing the underlying buffer around to indicate you want to reuse that same 10 MiB. And you have a view upon that ArrayBuffer that tells you what part of the ArrayBuffer you can inspect.

Both the ArrayBuffer and view instances are created fresh each read(), right?

Wouldn't another possible model be that read() always reuses the buffer? It keeps a pointer to the ArrayBuffer it returns, which is the size of whatever was the maximum it could give. When read() is invoked again that ArrayBuffer gets detached. The streams implementation could preserve the same underlying buffer and just hand out slices through the ArrayBuffer class. But I guess that's not flexible enough.

yutakahirano commented 9 years ago

I don't have a strong opinion, but I want a strong consensus so that we will not reverse the decision again...

tyoshino commented 9 years ago

@wanderview https://github.com/yutakahirano/fetch-with-streams/issues/35#issuecomment-92462246: Even without .read(numBytes), it could make sense to have an ability to return views to the same big buffer if the underlying source generates chunks with large memory regions while the stream want to return a little more fine-grained chunks so that the reader can exert back-pressure more precisely. This is also kinda contrived example. If we really want to provide this kind of flexibility over back pressure exertion, we should have .read(numBytes) as you said to accept the number of bytes to consume from the user explicitly (I also discussed this a little here https://github.com/whatwg/streams/issues/320#issuecomment-91606599).

So, I think if we add .read(numBytes), it should use ArrayBufferView, but for .read() which is common for byte and non-byte stream, I think it's ok to use the ArrayBuffer.

@annevk

When read() is invoked again that ArrayBuffer gets detached

As you said, this is not so flexible. With this requirement, we need to ask the user not to touch the ArrayBufferViews returned for previous read() calls once it issues a new read() call.

annevk commented 9 years ago

@tyoshino in the alternative model I put forward (that's just exploration for what it's worth) read() would always return an ArrayBuffer. And whenever you invoke read() again that would neuter it (to allow for reuse of the underlying buffer the stream controls). Developers would have to transfer bytes and add views.

tyoshino commented 9 years ago

Yeah. OK. I just meant that the following restriction is very different from what we have now.

And whenever you invoke read() again that would neuter it

tyoshino commented 9 years ago

To be clear, I'm fine with either ArrayBuffer or ArrayBufferView.

wanderview commented 9 years ago

To be clear, I'm not advocating for .read(numBytes) right now. That was a tangent. Sorry!

If .read() returned ArrayBufferView, how would the calling code know what type of view it actually is to use it? Or would they have to use chunk.buffer, chunk.byteOffset, and chunk.length to create their own view?

tyoshino commented 9 years ago

Sorry. By https://github.com/yutakahirano/fetch-with-streams/issues/35#issuecomment-92796493, I didn't mean that it should be unknown what kind of ArrayBufferView is read.

So, either of

specify as it returns a Uint8Array, or
leave it up to each API spec that exposes a ReadableByteStream to address your concern at https://github.com/yutakahirano/fetch-with-streams/issues/35#issuecomment-92436012

domenic commented 9 years ago

I don't have a strong opinion, but I want a strong consensus so that we will not reverse the decision again...

Yes, this...

If .read() returned ArrayBufferView, how would the calling code know what type of view it actually is to use it? Or would they have to use chunk.buffer, chunk.byteOffset, and chunk.length to create their own view?

I don't really understand this question. For "how would the calling code know what type of view it actually is to use it" I think just looking at view.constructor works. For "would they have to use chunk.buffer, chunk.byteOffset, and chunk.length to create their own view" I think they can just do new Float32Array(anyOtherView).

wanderview commented 9 years ago

Ok, I did not realize the typed array constructors could convert another typed array that easily. That's the same as what they would need to do with an ArrayBuffer.

So I think Uint8Array as a default is a bit odd, but its not really a problem. I personally don't think its worth triggering a fire drill to try to patch something in the blink implementation.

@annevk what do you think?

domenic commented 9 years ago

@annevk that's an interesting model. A couple disadvantages I can think of:

Requires fixing ahead of time the maximum buffer size (since the same underlying buffer is reused), so does not allow for the flexibility of saying "my destination needs 10 MiB right now" if the maximum buffer size is 5 MiB. (It might be possible to work around this if implementers are able to back the same ArrayBuffer by multiple backing buffers, but that seems a bit sketchy.)
Doesn't allow more complicated patterns where the app might be using more than one chunk at once (trading memory for throughput). For example the ping-pong pattern in https://gist.github.com/domenic/dab9e8da245cb4fe6027#file-ping-pong-js (or extrapolation of it to pools of sizes above 2).

annevk commented 9 years ago

I'm convinced about the existing model and with the Uint8Array default given the arguments provided. Using ArrayBuffer as output elsewhere (Fetch and Push API come to mind) still seems like a good idea as long as there are no reusing buffer concerns.

The Encoding Standard using Uint8Array is odd in that respect but does set a precedent for using that as the default view for bytes.

I'll leave it to @wanderview to sign off and close this.

wanderview commented 9 years ago

Still looks good with Uint8Array as default to me! I don't have perms to close issues in this repo, though.

yutakahirano / fetch-with-streams

View versus buffer #35