Open annevk opened 3 years ago
I assume we cannot implement convert to an ECMAScript value for byte sequence, since it's not an IDL type? So read all bytes must return a Promise
for a Uint8Array
instead, and we have to first convert the byte sequence to a Uint8Array
. We must also change step 4 of consume body to convert the Uint8Array
back into a byte sequence before passing it to package data.
Alternatively, we can make read all bytes not return a Promise
, and instead take successSteps
and errorSteps
arguments. Then we aren't limited by Web IDL conversions, and we can pass a byte sequence directly to successSteps
.
I think resolving with a Uint8Array
ends up being problematic because of Object.prototype.then
, see also https://github.com/web-platform-tests/wpt/blob/master/fetch/api/response/response-stream-with-broken-then.any.js.
So unless I'm missing something, I think we have to either:
I think we've been pretending that resolving a Promise with a non-JavaScript type is a meaningful operation, and mostly getting away with it, but now we've run out of luck. I'm sorry for the trouble it caused.
My guess is that having proper spec language for resolving Promises with Infra types would be hard, because the ES spec language is really complex and would be hard to integrate with.
This leaves us with callback hell. My feeling is that using Promises internally within specs turned out to be a bad idea, and so callback hell is inevitable. Maybe an alternative would be some kind of "lite promise" in Infra which we use purely for spec purposes?
(this might just be echoing @ricea above)
I think this bug is even more general: I've got a prototype implementation of streams that uses the promise semantics, and even without Fetch / Byte streams complications, it's still vulnerable to Object.prototype.then
, because of the WebIDL resolution semantics. Essentially, the spec as written today shouldn't pass the simplified test case or response-stream-with-broken-then.any.js.
Chrome passes because they are still implementing the forAuthorCode
bool workaround. I haven't checked WebKit yet.
I guess at least this very particular issue is not so bad, because we just convert "read all bytes" into callback (or lite-promise?) style. That's a pretty small transformation.
The more general issue could be problematic, though. In particular we use JS promises internally for startAlgorithm, pullAlgorithm, etc., which might be subpar.
So I just want to clarify the state of the world here, so I'm sure.
The original issue (https://github.com/whatwg/streams/issues/933) was that the internals of pipeTo
were visible. However, as Dominic hints above: We use promise resolution throughout the spec; is the position generally that visibility of the internals is bad, or just pipeTo? The impression I get from the writing around forAuthorCode
was that it was only internal resolutions that were of concern (hence I'm not sure if we need to worry about startAlgorithm
etc).
I got around to testing my simplified test case above in Chrome, and realized that we do in fact observe the Object.prototype.then
change, so I had originally misunderstood the intended scope.
Could we potentially just revert https://github.com/whatwg/streams/pull/1045?
In general visibility into the internals is "bad", but only for pipeTo (and maybe tee?) is it fatal. Because we really want to ensure that user agents can optimize those behind the scenes, doing something totally different than the exact spec steps but observably equivalent.
I'm very hesitant to revert #1045 because I think it improved the spec readability and architecture significantly, and we've built a lot on top of it.
My inclination is to do the local fix to these read-all-bytes/consume-body algorithms to use callbacks instead of promises, and then over time contemplate whether we can reduce the exposure of internals (like startAlgorithm etc.) via other targeted, low-priority fixes.
But I might not be understanding the issue. In particular the OP talks about undefined behavior, but I mean, it's kind of obvious what the intention is, so I don't understand why this is a big problem for Firefox's implementation instead of just being an editorial spec issue that we need to fix up. I'm sure I'm missing something there.
I apologize for the game of telephone that's been going on (I'll partially blame timezones here).
So, here's my take on this. I'm implementing ReadableStreams in Firefox using DOM technologies and WebIDL by following the current specification as best I can.
The test case response-stream-with-broken-then.any.js
tries to ensure that if Object.prototype.then
is set, it is un-observable via Response.text()
In our prototype implementation of this, what happens is we end up calling ReadableStreamDefaultReaderRead as part of the stream processing, which in turn calls ReadableStreamDefaultController.[[PullSteps]], then the readRequest's chunk steps, which resolves the promise with " «[ "value" → chunk, "done" → false ]»."
Now, as I understand it, the type of this value should be a ReadableStreamDefaultReadResult. However, the IDL semantics of resolving a promise with an IDL type first converts it to an object, which then resolves the promise. At this point, in our implementation, the original value is lost, and replaced by the value returned by Object.prototype.then
.
Now one place I'm not entirely comfortable is that I've essentially used the same structure of implementation that our previous JS implementation used when it comes to integrating with system provided streams. I have found it challenging to map the Fetch spec to our implementation, but have something largely working... except for this. Ultimately what I have implemented is treating our native stream system as if it was a NativeUnderlyingSource
, as if such a type existed. This is quite nice because it means that the majority of our implementation integrates with ReadableStreams very pleasantly.
My read on this is still that it's a specification issue, but it's complex because of the layering and I'm definitely not an expert on how fetch is supposed to work. As I say earlier.. I have a hard time mapping our implementation to spec.
In our prototype implementation of this, what happens is we end up calling ReadableStreamDefaultReaderRead as part of the stream processing, which in turn calls ReadableStreamDefaultController.[[PullSteps]], then the readRequest's chunk steps, which resolves the promise with " «[ "value" → chunk, "done" → false ]»."
The last step you mention (resolving a promise) should not happen if you're using ReadableStreamDefaultReaderRead. That's only part of ReadableStreamDefaultReader.read().
ReadableStreamDefaultReader.read()
should return a Promise
with a ReadableStreamDefaultReadResult
, and since such a read result is converted by Web IDL to a JavaScript Object
, then it's unavoidable that Object.prototype.then
is accessed. This is expected behavior.
However, other specifications (such as Fetch) shouldn't interact with this API, they should use read a chunk instead. This algorithm does not return a promise, instead the caller must provide a read request which will receive the result. Since there are no promises involved, Object.prototype.then
is never accessed.
For read all bytes, the problem is that we resolve a (supposedly internal) promise with a byte sequence. A byte sequence is not a Web IDL type or an ECMAScript value, hence the undefined behavior mentioned in the OP. But even if we "fix" it by resolving with a Uint8Array
instead, we still end up with an object (a Uint8Array
) that could have a modified .then
method (either through Uint8Array.prototype.then
or Object.prototype.then
). Thus, the resolution value of this internal promise can be manipulated by author code, which is not good.
One way to solve this is by resolving with an object that cannot possibly have a modified .then
method. For example, this would work (and that's what we used to do with forAuthorCode
):
const result = Object.create(null);
Object.defineProperty(result, 'bytes', { value: new Uint8Array(bytes) });
resolve(result);
Fetch could then read the bytes in its fulfillment handler from result.bytes
, and be confident that it hasn't been tampered with by author code.
However, this isn't very pretty. It's weird that an internal algorithm would need to construct a wrapper JavaScript object first just to resolve a promise. That's why we suggested replacing the promise with a callback, or introducing a "special" type of internal promises that are separate from ECMAScript Promise
s.
The last step you mention (resolving a promise) should not happen if you're using ReadableStreamDefaultReaderRead. That's only part of ReadableStreamDefaultReader.read().
A-Hah! You spotted (what is almost certainly) the bug in my code (The confusion comes from the ambigious naming in the JS implementation of streams just below!).
I'm past end-of-day here, but will look at fixing this tomorrow and make heads or tails of the rest.
Glad I could help! 😁
Anyway, while we're on the topic of resolving internal promises with possibly author-manipulated objects or non-IDL values, it looks like we have a few more problematic cases:
Promise<sequence<T>>
. When a sequence is converted to an ECMAScript value, we construct an Array
. So... what stops anyone from defining Array.prototype.then
? 😛 It sounds like "get a promise for waiting for all" is not safe for internal use, and specs should use "wait for all" instead.Thanks to @noamr Fetch is no longer doing this: https://github.com/whatwg/fetch/commit/464326e8eb6a602122c030cd40042480a3c0e265.
Both Streams and Fetch (which builds upon Streams) resolve a promise with a byte sequence:
Unfortunately this gives UB in step 2 of https://webidl.spec.whatwg.org/#resolve.
(Discovered through https://bugzilla.mozilla.org/show_bug.cgi?id=1736622.)