whatwg / webidl

Web IDL Standard
https://webidl.spec.whatwg.org/
Other
410 stars 164 forks source link

Add `async iterable<T>` type to WebIDL #1397

Open lucacasonato opened 7 months ago

lucacasonato commented 7 months ago

This commit lifts the async iterator processing logic from the ReadableStreamFromIterable operation in the Streams spec, to a new WebIDL type async iterable<T>.

This will clean up the streams spec, and enable a change in the Fetch spec to directly allow taking async iterable as request/response bodies, without users having to pass them to ReadableStream.from first.

(See WHATWG Working Mode: Changes for more details.)


Preview | Diff

saschanaz commented 7 months ago

Has there been discussion somewhere publicly? πŸ‘€

lucacasonato commented 7 months ago

@saschanaz No, not of any specifics. The background here is this issue: https://github.com/whatwg/fetch/issues/1291#issuecomment-2009010502. @annevk mentioned we'd need to figure out how to make this work IDL wise. This is my attempt at that (it enables updating the BodyInit overload in Fetch from:

typedef (Blob or BufferSource or FormData or URLSearchParams or USVString) XMLHttpRequestBodyInit;
typedef (ReadableStream or XMLHttpRequestBodyInit) BodyInit;

to

typedef (Blob or BufferSource or FormData or URLSearchParams or USVString) XMLHttpRequestBodyInit;
typedef (ReadableStream or async iterable<BufferSource> or XMLHttpRequestBodyInit) BodyInit;
saschanaz commented 7 months ago

Hmm, I see. BTW please include the full PR template: https://github.com/whatwg/webidl/blob/main/PULL_REQUEST_TEMPLATE.md

annevk commented 7 months ago

I wonder how this ends up working in practice. Say you have Blob or async iterable<T> or DOMString. If you pass in a Blob with @@asyncIterable support, how is it handled? I guess @@asyncIterable comes later, probably before toString?

cc @petervanderbeken @natechapin

petervanderbeken commented 7 months ago

What is being proposed exactly, to what WebIDL type would it be converted? A reference to an Iterator Record?

lucacasonato commented 7 months ago

I wonder how this ends up working in practice. Say you have Blob or async iterable or DOMString. If you pass in a Blob with @@asyncIterable support, how is it handled? I guess @@asyncIterable comes later, probably before toString?

@annevk I have specified this behaviour in an update to the overload resolution algorithm in this PR already. The gist is: an async iterable<T> has the same behaviour for distinct types as sequence<T> (which one could think of as the eager synchronous version of async iterator<T>).

I have updated the graphic in the spec that explains this too:

image

In practice:

  1. async iterator<T> can be distinguished from all types but object and sequence<T>
  2. As with sequence<T>, a matching interface is considered more important than a platform object that implements the iterable, or async iterable protocol. So specifically for the BodyInit with Blob case: nothing changes.
  3. As with sequence<T>, primitives like strings, even if they have @@iterator on their prototype, are not considered for async iterator<T> overloads, and are also not valid async iterator<T> values. async iterator<T> requires values are Type(V) == Object, in addition to having either @@iterator or @@asyncIterator.

What is being proposed exactly, to what WebIDL type would it be converted? A reference to an Iterator Record?

@petervanderbeken I am proposing the addition of an async iterable<T> type, which mirrors sequence<T> in many ways, but unlike it is:

If you imagine a matrix of JS stream primitives, with asynchronicity on one axis, and eagerness on the other, this fills out the square opposite of sequence<T>.

sync async
eager JS: Array<T>; IDL: sequence<T> JS: Promise<Array<T \| Promise<T>>>; async sequence<T> (doesn't exist)
lazy JS: { @@iterator: Iterator<T> }; IDL: iterable<T> (doesn't exist) JS: { @@asyncIterator: AsyncIterator<T> }; IDL: async iterable<T> (this PR)

The use case for this type are Web APIs that take "streams" of values (of unknown length), that they then process in some way. A real Web API that does this right now is ReadableStream.from(). It takes objects that are iterables as a Web IDL any type, and performs manual processing of this iterable. For one API, this is fine, but there are other proposed APIs that would also take async iterables as arguments:

Introduction of this async iterable<T> simplifies all of these, because the entirety of the iterable iteration logic is now pulled into Web IDL, where all callers can share it.

saschanaz commented 7 months ago

What is being proposed exactly, to what WebIDL type would it be converted? A reference to an Iterator Record?

Per https://whatpr.org/webidl/1397/402a886...14f1bdb.html#js-async-iterable the result indeed seem to be iterator record. (Adding because I don't think the comment above actually answers this.)

annevk commented 7 months ago

This design sounds good to me. It seems the Build is currently failing though. Could you write a corresponding PR for Streams to show it in action so we have some assurance it can be adopted?

cc @MattiasBuelens

MattiasBuelens commented 7 months ago

3. As with sequence<T>, primitives like strings, even if they have @@iterator on their prototype, are not considered for async iterator<T> overloads, and are also not valid async iterator<T> values. async iterator<T> requires values are Type(V) == Object, in addition to having either @@iterator or @@asyncIterator.

This would technically be a breaking change for ReadableStream.from(), since we currently do accept strings (see this test).

That said, I wrote that test more for the sake of completeness, and I don't expect much actual usage in the wild. We can consider making ReadableStream.from() throw on strings, or we could change its signature to accept async iterable<any> or string instead.

lucacasonato commented 7 months ago

@MattiasBuelens I was not aware of this behaviour in Streams. I think we should stick with the behaviour where async iterator<T> must be an object in this PR (mirroring sequence<T>). Overload resolution becomes tricky and rather confusing otherwise.

Let's discuss whether to accept or error on DOMString arguments in the ReadableStream.from() PR.

lucacasonato commented 7 months ago

@annevk I had already opened a draft PR to the streams spec, showing the integration yesterday. Here is a link: https://github.com/whatwg/streams/pull/1310.

lucacasonato commented 7 months ago

Do we have Web IDL tests in WPT? I don't think so, right? I guess no tests specifically for this are relevant.

Considering this is mostly an editorial refactor, do we even need "implementer interest"? There is nothing to implement until an upstream spec uses this. I'll ask for support anyway though:

Also @jasnell, any thoughts?

lucacasonato commented 7 months ago

Got some feedback from @saschanaz elsewhere that the constraint that async iterable<T> is only valid in JS->IDL positions (rather than also in IDL->JS positions) should be more prominently highlighted. I'll address this tomorrow

domenic commented 7 months ago

Do we have Web IDL tests in WPT? I don't think so, right?

We do, but they're weird since they test things via APIs that use those features. https://github.com/web-platform-tests/wpt/tree/master/webidl/ecmascript-binding

I guess no tests specifically for this are relevant.

I think linking to the tests for streams using this feature would be good. Especially if you can verify that everything you introduce in this PR is covered by some aspect of the streams tests. (Which is probably the case since the streams tests are quite thorough, IIRC.)

Considering this is mostly an editorial refactor, do we even need "implementer interest"? There is nothing to implement until an upstream spec uses this. I'll ask for support anyway though:

Implementer support for Web IDL stuff like this is tricky and requires a judgement call. I think we should CC the bindings teams and give them a chance to weigh in, so @japhet and @yuki3 for Chromium. For example if they were concerned about how to implement this in the bindings layer, or if they were concerned that introducing this feature would be bad because then too many specs might use it and actually it should be rare.

However, since this doesn't require any immediate changes to any implementations, I think we shouldn't block on requiring such support, if we don't hear back.

domenic commented 7 months ago

Regarding strings, I think we should follow the behavior of async iterator helpers's AsyncIterator.from(); /cc @bakkot. I cannot tell what that behavior is, however. Iterator.from() definitely accepts strings. But AynscIterator.from() calls GetIteratorFlattenable with a second parameter whose value is async, which is neither accept-strings nor reject-strings.

bakkot commented 7 months ago

AsyncIterator.from will accept strings because Iterator.from does. But do note that it's specifically a coercion method. For something like ReadableStream.from, which is also a coercion method, you may wish to follow that precedent, but I'm not sure you'd want to do so for other async-iterable-taking things.

The way AsyncIterator.from works is to look up Symbol.asyncIterator, and if that fails then fall back to Symbol.iterator with the result wrapped the same way for await does (modulo some quibbles not relevant here), and if that fails to use the .next method directly (which only iterator-taking things should do, which is to say, nothing except AsyncIterator.from and AsyncIterator.prototype.flatMap).

And since it will not specifically reject strings, and strings have a Symbol.iterator, it will accept them, the same way for await does.

(Sorry the spec for AsyncIterator.from is out of date; I'm working through other parts of the semantics and it doesn't really make sense to get the spec updated until those are sorted. But this question, at least, is settled.)

domenic commented 7 months ago

Thanks for the quick response. I'm unsure whether we should think of new Response(asyncIterable) as a "coercion method", but I lean toward yes... and those are our two use cases so far, so, I'm unsure where that leaves us for the general Web IDL type.

bakkot commented 7 months ago

Personally I would specify this by making async iterable<T> reject strings, and then adding or USVString to those places which should accept strings, just to ensure that future users of the type have actually decided to take strings on purpose instead of it happening by accident.

Plus if you take a string explicitly then you can use it directly instead of having to look up and invoke String.prototype[Symbol.iterator], which is going to be... slower.

lucacasonato commented 7 months ago

Thanks for the comments @domenic and @bakkot. I agree strongly with @bakkot that in the Web IDL type, we should only allow objects implementing the async iterable protocol for the async iterable<T> type. Anything else would be a significant hassle with a high chance for unexpected behaviour when plumbed into the overload / union resolution algorithm. Also there is already precedent in Web IDL that iterable taking types do not support iterating string (sequence<T> does not support iterating strings, but does iterate objects implementing the iterable protocol).

For the streams specific discussion, lets move it right here: https://github.com/whatwg/streams/pull/1310#issuecomment-2022484496

lucacasonato commented 6 months ago

Folks, could you take another look? I have clarified the allowed positions for async iterables (namely not in attributes), and have added guidance that: a) async iterable IDL values can not be constructed in IDL, and always must come from language bindings b) that spec authors should generally return an interface with an async iterable declaration, not an async iterable IDL value

@annevk has confirmed implementer interest for WebKit on Matrix - is anyone else in favor, or opposed?

jasnell commented 6 months ago

This LGTM

lucacasonato commented 4 months ago

Gently pushing for reviews again :)

Still need positions from Chromium (@domenic) and Mozilla (@saschanaz).

bakkot commented 4 months ago

TC39 is considering rejecting strings in most (new) iterable-taking positions in general (and of course async iterable taking positions as well). It'll be discussed at the plenary at the end of the month, and I'm guessing it'll be adopted.

annevk commented 4 months ago

@lucacasonato coupling this with the Streams PR I'm happy to consider Chromium and WebKit as interested and nobody opposed. This PR needs rebasing, the checklist in OP needs a bit more work, and maybe @MattiasBuelens can do a final pass after that. With that I think we're good to go.

saschanaz commented 4 months ago

I'd like to ask @petervanderbeken for the position as he will be the implementer.

A parallel question for distinguishability: currently it says it's distinguishable with interfaces and dictionaries, but what happens if the interface has async iterable<T>;? sequence also says it's distinguishable, so I think I just lack some knowledge here πŸ˜›

lucacasonato commented 4 months ago

@saschanaz The interface can be distinguished from other objects because it is branded. A union of interface and async iterable<T> will always attempt to match the interface first. This is also why dictionary and interface are distinguishable.

Union resolution goes like this in this case: `MyInterface | MyDictionary | async iterable (assuming V has been checked to be an object):

  1. Check if V is an instance of the interface. it's an interface
  2. Check if V has [Symbol.asyncIterable]. it's an async iterable
  3. it's a dictionary
MattiasBuelens commented 3 months ago

@lucacasonato I fixed the remaining Bikeshed issues in this branch. Could you pull those changes into your branch? πŸ™‚

lucacasonato commented 3 months ago

@domenic What tests should I write for this? I don't think we have specific webidl tests in WPT, right? Only tests for features that use the webidl types.

annevk commented 3 months ago

Just to make sure I understand correctly, this is blocked on tests and this is best tested through the Streams change https://github.com/whatwg/streams/pull/1310 which is currently still in draft. So once the Streams change is ready and has test coverage (which should include coverage for the IDL bits) this is all good to go?

lucacasonato commented 2 months ago

So during an experiment we ran for new Response(async iterable<Uint8Array>), we discovered another fun edge case to think about: when having a union of string | async iterable<Uint8Array>, what should a boxed string resolve as? In the current spec text it resolves as a async iterable<DOMString>, which would cause an error during iteration. This happens because the boxed string is both an object, and has a [Symbol.iterator] method.

We have two options here:

  1. handle this case specially in WebIDL by banning boxed string as a value for async iterable, like we do with primitives (kinda weird, but means this doesn't become a weird edge case only in fetch and is instead uniform across the platform)
  2. handle this as a weird edge case in fetch, and in the future in any other string | async iterator WebIDL unions.

The big question is whether, assuming ReadableStream.from("asd") throws, ReadableStream.from(new String("asd")) should throw, or create a stream of string chunks.

Any thoughts?

lucacasonato commented 2 months ago

Copying over a thought from @annevk:

Treating string objects and values in the same way whenever possible seems preferable. Which argues for 1 as otherwise each API would have to deal with this anew.

bakkot commented 2 months ago

In TC39 we went the other direction, and said that String objects are just another kind of object. Having the check conceptually be "is a primitive" is the simplest mental model.

If the user provides a primitive wrapper Object such as a String Object, however, it should be treated like any other Object.

annevk commented 2 months ago

That also seems reasonable, but the problem is that any API currently accepting a string would need to juggle a lot of complexity to safely adopt async iterable<T>.

Perhaps there's a third solution whereby string | async iterable<T> cases involving a String object are resolved by going down the string path? But if there's no string in the type union we'd treat a String object as an Object for the purposes of async iterable<T>. (Of course, this has the opposite problem for when you start out with async iterable<T> and later overload with string, but that seems far less likely.)

lucacasonato commented 2 months ago

I like the option where we just handle this in the union.