Open ricea opened 5 years ago
A more generic method might be stream.toArray()
which returns a promise for an array of chunks. (If https://github.com/tc39/proposal-iterator-helpers lands this would basically become an alias for stream.values().toArray()
, although potentially easier to optimize since it's more direct.)
If we had that, then we'd just need a way of turning an array of Uint8Arrays into a single Uint8Array. Surprisingly I can't find any easy way to do this, especially not performantly. So, maybe that is not the right path.
Probably also want to consider an equivalent API where the buffer is already allocated (similar to encodeInto()
, BYOB).
Probably also want to consider an equivalent API where the buffer is already allocated (similar to
encodeInto()
, BYOB).
I like the idea of a method that fills up an entire ArrayBuffer
(or ArrayBufferView
). I think it would work better as a reader method, for example ReadableBYOBReader.readFully(view)
(taken from Java's DataInput.readFully()
). Optionally, we could also add a shorthand method on ReadableStream
, similar to how ReadableStream.cancel()
is a shorthand for ReadableStreamDefaultReader.cancel()
. The readInto()
function in this example seems like a good starting point.
It would be different from ReadableStream.arrayBuffer()
and/or ReadableStream.toArray()
though. Those two methods would always read the entire contents of the stream, whereas a readFully()
method may leave some bytes on the stream if the given buffer is not large enough to fit the entire contents.
How about:
array()
- Array of chunksblob(BlobPropertyBag)
- Blob. Rejects and cancels stream if any chunks are not BlobPart.arrayBuffer({ into })
- Rejects and cancels stream if any chunks are not buffer sources, or if into
is provided and isn't big enough to hold the data. Resolves with an arraybuffer, which will be a slice of into
if it's provided.Maybe we can look at text()
later, but (await stream.array()).join('')
doesn't seem too bad.
The new Response(blob).body
hack was the reason why we got blob.arrayBuffer(), blob.text() & blob.stream()
You can use the same hack here ... new Response(stream).blob()
Could you imagine what the fetch would look like instead if we had done response.body.blob()
? then we would probably not have response.blob()
in the first place
It would be nice to have all array/blob/text/arrayBuffer - hopefully node streams would follow suit than you could use express.js to do stuff like
app.post('route', (req, res) => {
await req.arrayBuffer()
})
Rejects and cancels stream if any chunks are not BlobPart.
Not sure what you mean by this
Blob constructor can take any kind of object you throw at it. The default fallback behavior is to cast anything it can't handle into a string...
if you do: new Blob([{}])
then it will dump "[Object object]"
as the content. Maybe should do the same thing.
Could you imagine what the fetch would look like instead if we had done
response.body.blob()
? then we would probably not haveresponse.blob()
in the first place
That would have delayed shipping fetch and therefore service workers for many years. Also, the methods on response
are still useful, because:
Blob
can be filled in from the request/response content-typeBlob constructor can take any kind of object you throw at it. The default fallback behavior is to cast anything it can't handle into a string... if you do:
new Blob([{}])
then it will dump"[Object object]"
as the content. Maybe should do the same thing.
Yeah, agreed. My intention was: Chunks can be anything that can appear in the sequence used in the blob constructor. As you say, almost everything is acceptable, with a few exceptions like detached buffers.
FWIW, +1 to a readFully method on BYOB readers. Maybe a new issue for it?
An interesting use of .blob()
that I hadn't thought of is to efficiently handle streams that would use excessive amounts of memory to load completely, and so would ideally be stored on disk. The platform doesn't provide a primitive to stream data into a Blob, but it appears implementations do have this as part of their implementation of the blob()
method on Response
.
See https://stackoverflow.com/questions/67362823/progress-for-a-fetch-blob-javascript/67364010#67364010 for an example we could make much cleaner.
Looks like Node has just implemented this for arrayBuffer, blob, json, and text, but the API is a bit different. https://nodejs.org/api/webstreams.html#webstreams_utility_consumers
Any chance of this ever landing? It's amazing to me that there's no "stream.collect" equivalent.
just figured if this could be part of something like iterator helpers sense streams are iterable then this should be essentially free. and they would then also work for node streams and for any kind of iterator that could yield any form of ArrayBufferView
just figured if this could be part of something like iterator helpers
As mentioned above, iterator helpers have a .toArray
method, and async iterator helpers will have the same. The helper will let you do await stream.values().toArray()
, which is nicer than what you currently do - but getting all the bytes into a single buffer after is still going to be awkward, and even with a helper like this you'd still have a choice between either copying every chunk after reading all of them or giving up locality. This use case really wants a more specific "put all the bytes from a readable byte stream into a single buffer" helper.
And iterator helpers are intended as fully general utilities, so they're probably not the right place to have a method specifically for producers which yield chunks of bytes. That's a pretty uncommon thing to do with iterators anyway; the ReadableStream machinery is much better suited to handle such producers efficiently.
I believe that is native streaming instruments had better api they would be used more, which should not be a controversial opinion.. Even if we are collectively 'stuck' on this issue, letting it rot will not move this forward. Streaming is a great concept, and is becoming more and more relevant in the browser. Do we not think that this should be given another inspection?
We would like to implement this in Bun, along with ReadableStream.prototype.text
, ReadableStream.prototype.blob
, and ReadableStream.prototype.json
.
We also would be interested in implementing ReadableStream.prototype.array
and ReadableStream.prototype.formData
but feel less strongly about those. FormData
can accept an optional Content-Type argument and if provided parse as multi-part form and otherwise parse as url-encoded form, rejecting the Promise if invalid data is passed (and canceling the stream)
Can I suggest instead having a .bytes()
method which gives you a Uint8Array? See https://github.com/w3ctag/design-principles/issues/463, https://github.com/whatwg/fetch/issues/1724#issuecomment-1814626681, etc. My proposal for base64 support in JS also operates on Uint8Arrays.
I'm happy with this approach also (in contrast to what I suggest in #1238).
A key challenge I have with having these methods directly on ReadableStream
is that they are not useful with ReadableStreams
that are not byte-oriented. Putting these on a Reader
instead makes it a bit nicer to limit to just byte-oriented streams similar to the how ReadableStreamBYOBReader
does. If we do add these directly to ReadableStream.prototype
then they should only work on byte-oriented streams.
ReadableStream.prototype.array()
really isn't necessary given that Array.fromAsync(...)
(https://github.com/tc39/proposal-array-from-async) is coming.
const chunks = Array.fromAsync(readable);
ReadableStream.prototype.text()
is a bit tricky if we can't specify the text encoding. With response.text()
we can use the content-type
to identify the encoding, and at that point there's a bit of overlap with TextDecoderStream
that we'd need to reconcile.
Honestly, the more I think about it, the more I'd really prefer the ...fromAsync(...)
pattern for multiple things...
const u8 = await Uint8Array.fromAsync(iterable); // iterable must provide BufferSources
const str = await String.fromAsync(iterable); // iterable must provide stringifiable chunks
const val = await JSON.fromAsync(iterable); // iterable must provide stringifiable chunks
// etc
For text encoding, the pattern is pretty natural:
const str = await String.fromAsync(readable.pipeThrough(new TextDecoderStream("...")));
If we do add these directly to ReadableStream.prototype then they should only work on byte-oriented streams.
If the streams are passing strings that should be fine too for json
and text
, right? Particularly if they're internally buffering rather than returning the next chunk of valid data
Honestly, the more I think about it, the more I'd really prefer the ...fromAsync(...) pattern for multiple things...
From an implementer perspective, it sounds nice.
From a JS developer perspective, I have trouble imagining developers prefer this over .text()
, .bytes()
, .json()
.
My gut here is that most JS developers would prefer the idea of both, but probably 98% of all usage would be one of .text()
, .json()
.arrayBuffer()
.bytes()
rather than the Constructor.fromAsync(iterable)
variant.
People are used to fetch
-like .text()
instead of fromAsync
and fromAsync
is more complicated to explain.
This chain of reasoning is more complicated:
- it's a
ReadableStream
- therefore it's an async iterable
- therefore to get the value as a
String
you callString.fromAsync
Compared to:
- call
.text()
on theReadableStream
if we can't specify the text encoding
Isn't there precedent for defaulting to UTF-8? Or is that Windows-1252?
For text encoding, the pattern is pretty natural:
const str = await String.fromAsync(readable.pipeThrough(new TextDecoderStream("...")));
I think this pattern composes well, but would only feel natural to developers who understand a lot of streams and spec authors. It would feel complicated for less experienced developers
I would guess that something like Uint8Array.fromAsync
would be unlikely to be optimized well: that is, it would probably end actually allocating and reifying each chunk of bytes as a JS object. It's possible to optimize that away under some circumstances, but it would require special cases for each producer/consumer pair, and there's a lot of fiddly details which would need to get implemented. Whereas something like ReadableStream.prototype.bytes()
could trivially avoid those intermediate allocations.
There is precedent for enforcing UTF-8 for text()
and I'd object to doing anything else here. Folks can write their own wrapper if they can't be bothered to fix the legacy content.
It looks like we have a plan here, and we just need someone to do the work. I don't have time to commit to this in the near future, but if someone has time to draft a PR, I'd be happy to look at it.
I believe we have rough consensus on the following issues:
USVString
will result in a lot of [object Object]
in the output, but people have been living with that with Blob and it isn't the end of the world.TextDecoderStream
already exists for people who need more encoding support.text()
, arrayBuffer()
and blob()
seem uncontroversial. json()
is easy to do if we spec it as parsing at the end like fetch does but maybe there isn't much demand? Other methods seem to require further discussion.I think it's worth considering doing bytes()
(Uint8Array
) instead of arrayBuffer()
as @bakkot suggested. We should probably do bytes()
on the other objects that offer such methods.
I like bytes()
but I thought it might be controversial.
bytes()
seems nice. I think it’s unusual for developers to want to create an ArrayBuffer
, they’re anlmost always going to either pass it somewhere else or create a Uint8Array
wrapping it. In JSC iirc creating an ArrayBuffer ends up being more expensive since it becomes a WastefulTypedArray
internally. As a sidenote, are there plans for BodyMixin
/ Response
/ Request
& Blob
to get a bytes()
?
No concrete plans, but PRs will be accepted.
All SGTM!
As a sidenote, are there plans for
BodyMixin
/Response
/Request
&Blob
to get abytes()
?
I have https://github.com/whatwg/fetch/issues/1732 but there's no expressed implementer interest yet.
I believe we have rough consensus on the following issues:
- Chunks should be of the same types as accepted by the Blob constructor
Maybe I'm missing something but I don't see this part in this thread, has there been discussion about this somewhere else? new Response(stream).arrayBuffer()
throws if the chunk is not Uint8Array, so this is a bit surprising to me.
Maybe I'm missing something but I don't see this part in this thread, has there been discussion about this somewhere else? new Response(stream).arrayBuffer() throws if the chunk is not Uint8Array, so this is a bit surprising to me.
This was based on @jimmywarting and @jakearchibald's comments: https://github.com/whatwg/streams/issues/1019#issuecomment-587358358
However, on re-reading I realised that @jakearchibald was only saying this should work for the blob()
method. This comes for free if you create a Blob by converting the ReadableStream to an array and then calling the Blob(array)
constructor, but I would like to stream data into the Blob if at all possible.
Anyway, it looks like we don't have consensus on this item, so let's discuss it a bit more.
First question: should all of text()
, blob()
and bytes()
accept the same chunk types?
Arguments for:
await (await response.body.blob()).text()
just to accept the largest possible set of chunk types.Arguments against:
Anything else?
First question: should all of
text()
,blob()
andbytes()
accept the same chunk types?
I vote yes.
My second question: do we really want to accept everything possible in non-byte ReadableStream instead of simply requiring ArrayBufferView as ReadableByteStreamControllerEnqueue does? If yes, should we also accept everything possible in the byte stream controller?
Is there an equivalent for getting the endings
option if we allow strings?
If I had a vote I'd say "yes, but only buffer sources". Allowing Blobs and strings in a constructor is not the same as allowing them in a stream.
I've made a first attempt at drafting some spec around this in #1311 ... it intentionally limits to just text, Uint8Array, and Blob, ignoring json()
for now. Still likely a number of rough edges but it's a start
blob()
, text()
and bytes()
with an AbortSignal
. It cancels the ReadableStream
if abort is signalled. An alternative would be to leave data in the stream. Personally I prefer the cancellation approach since it is more likely to be consistent between implementations, but if anyone feels differently please speak up.Yeah, definitely worth talking about that point. I had considered make it so that it would only cancel the read loop and not the underlying stream. A preventCancel
option like with pipeTo
/pipeThrough
might be worthwhile?
It would be good to have a convenience method to concatenate the contents of a ReadableStream into a (promise for a) Uint8Array.
It is unclear what should happen if the stream contains chunks that are not ArrayBuffer or ArrayBufferViews.
Other similar methods we might consider are
.blob()
and.text()
.