yutakahirano / fetch-with-streams

Fetch API integrated with Streams.
66 stars 8 forks source link

Preparation for merge #25

Closed yutakahirano closed 8 years ago

yutakahirano commented 9 years ago

NOTE: the draft is not yet ready

The draft should be updated after the Streams interface change is done, but after that I would like to merge it to the fetch spec.

@annevk, Is there anything I can do as preparation for merge? Is it better to provide an html style pull request rather than the current md style document?

Thanks,

yutakahirano commented 9 years ago

@domenic @tyoshino FYI

annevk commented 9 years ago

I think this is fine. I can make the edits once all is agreed and then I'd really like a review of that change 😊

yutakahirano commented 9 years ago

Thank you.

annevk commented 9 years ago

So what about the open issues against this repository? Could you close those that are not considered an issue for merging? That would make things a bit clearer.

yutakahirano commented 9 years ago

Thanks, I closed some and labelled some with "not included in the first release".

Others should be resolved by the merge.

yutakahirano commented 9 years ago

@wanderview @annevk @domenic @tyoshino Although I need some polishing after the Streams spec is updated, I think the concept is solid and it's ready for review on "initial release", i.e. adding Response.body.

Any feedback is welcome.

Thanks!

annevk commented 9 years ago

You don't want to wait for #14? I would also love to see some commitment from other implementers.

annevk commented 9 years ago

@yutakahirano I hear from @wanderview that he has yet to address @sicking's concerns and is therefore not willing to commit to implementing yet. I have also heard nothing from either Apple or Microsoft. Given that, I think I should wait integrating this until we have two implementers on board.

wanderview commented 9 years ago

Yes, I had actually hoped to implement fetch body streams in gecko in Q2, but got a lot of push back. I think we need to reach more internal consensus before moving forward. Sorry for the delays here.

yutakahirano commented 9 years ago

I see, thanks.

yutakahirano commented 9 years ago

@KenjiBaheux FYI

wanderview commented 9 years ago

Also, the current proposal is based on ReadableByteStream which is not in the main streams API spec yet. It seems that might need to be corrected one way or another before merging to the fetch spec.

wanderview commented 9 years ago

I'm sorry for the delay on reviewing this. I haven't had time until now. I'm going to be looking at streams more closely this week.

Here are my thoughts regarding the current readme doc in this repo:

domenic commented 9 years ago

Answering what I can...

As I mentioned in a previous comment, ReadableByteStream is not in the streams spec yet. It would be nice to at least link to the prototype.

I tried to clarify this a bit with #29.

Why doesn't Request have a body stream in its webidl? Or rather, shouldn't the ReadableStream body attribute be defined on the Body mix-in?

I think the idea is to gradually introduce streams bit by bit. I'm not sure if there are any specific issues with putting it on request though that blocked that from being in "v1"...

Why not just require it to remain open if there are observable handlers in use?

This was somewhat explored in #15. I think the eventual conclusion was that it was too hard to make anything based on e.g. whether promises have handlers work. Instead we ended up with the semantic that you only get auto-cancel-on-GC if you do high-level things like piping. If you actually get a reader, you've taken the fate of the stream into your own hands, and so need to be responsible for calling cancel() yourself.

Also, what does this mean for infinite streams or streams stuck on back-pressure? Does the UA just keep them open forever?

Yeah, I think that's the idea. It's the same as if you allocate a large array buffer and then don't clean it up. You've manually said "I want this potentially-large resource to stick around." So it does.

wanderview commented 9 years ago

This was somewhat explored in #15. I think the eventual conclusion was that it was too hard to make anything based on e.g. whether promises have handlers work. Instead we ended up with the semantic that you only get auto-cancel-on-GC if you do high-level things like piping. If you actually get a reader, you've taken the fate of the stream into your own hands, and so need to be responsible for calling cancel() yourself.

It seems to me we could at least allow termination on GC if the getters for these promises have never been executed. We know for certain in that case there are no observable handlers.

Also, @annevk pointed me to the XHR spec section on GC abort:

https://xhr.spec.whatwg.org/#garbage-collection

This seems easier for me to understand as an implementer. Instead of saying "GC the object but don't abort", it just says under what conditions the XHR cannot be GC'd. It then specs GC to abort the XHR.

It would be nice if we did the same here. If the ready or close promises have ever been accessed and the reader object is locked, then the stream must not be GC'd. If the stream is GC'd while still open, then the stream aborts.

Yeah, I think that's the idea. It's the same as if you allocate a large array buffer and then don't clean it up. You've manually said "I want this potentially-large resource to stick around." So it does.

Can you clarify this? I can't find anything that says ArrayBuffer objects don't GC normally.

wanderview commented 9 years ago

I think the idea is to gradually introduce streams bit by bit. I'm not sure if there are any specific issues with putting it on request though that blocked that from being in "v1"...

Is it really that much effort to implement? This proposal already writes the .text(), etc convenience routines in terms of consuming a body stream. It would seem trivial to expose the fixed stream currently described as .body.

The thing that seems hard is streaming into the Request body. I would expect that to be a ReadableStream passed into the Request Constructor, though.

wanderview commented 9 years ago

The thing that seems hard is streaming into the Request body. I would expect that to be a ReadableStream passed into the Request Constructor, though.

Or rather a ReadbleStream added to the BodyInit union. We should be able to stream into synthesized Response bodies as well.

domenic commented 9 years ago

It seems to me we could at least allow termination on GC if the getters for these promises have never been executed. We know for certain in that case there are no observable handlers.

This seems like a much harder programming model for authors to grasp. Just getting a property now has the side effect of preventing cancel? Providing a clear-cut delineation, where getting a reader means the fate of the stream is now in your hands, seems nicer to me...

This seems easier for me to understand as an implementer. Instead of saying "GC the object but don't abort", it just says under what conditions the XHR cannot be GC'd. It then specs GC to abort the XHR.

I guess this is observably equivalent. However I would imagine implementations will probably GC the stream or the XHR anyway, according to normal garbage-collection semantics, instead of adding exceptions for those objects to their garbage collector. In other words, there's no inherent reason that the memory lifecycle of an object should be tied to whether the socket connection gets closed.

In fact, with streams there's a good reason to GC-but-not-cancel anyway, since the internal queue might be holding some amount of data, and it's a shame to keep that in memory when it's not necessary to do so. So in such a case it would be good to reclaim that memory, even if you can't reclaim the socket.

Can you clarify this? I can't find anything that says ArrayBuffer objects don't GC normally.

I meant simply that if you hold on to an ArrayBuffer, then you have taken responsibility for freeing it (by nulling out all references to it in your program and letting GC kick in).

Is it really that much effort to implement?

Yeah, not sure... maybe @yutakahirano just hasn't gotten around to speccing it yet?

Or rather a ReadbleStream added to the BodyInit union. We should be able to stream into synthesized Response bodies as well.

I think the plan of record in previous discussions (and revisions of this repo) was something like

const request = new Request({
  method: "POST",
  bodyWriter(writableStream) {
    writableStream.write(c);
    otherReadableStream.pipeTo(writableStream);
  }
});
wanderview commented 9 years ago

This seems like a much harder programming model for authors to grasp. Just getting a property now has the side effect of preventing cancel? Providing a clear-cut delineation, where getting a reader means the fate of the stream is now in your hands, seems nicer to me...

Do you expect devs to design around this "feature" of performing an unobservable fetch? Thats not how XHR works (where it aborts if you don't have a handler registered).

It also seems APIs that require explicit resource release (like createObjectURL/revokeObjectURL) are pretty leak-prone. I would be less concerned if the streams were guaranteed to run to completion, but the fact that they can pause until read means its going to be very easy to leak.

Providing a safety-net for the case where code calls getReader() but doesn't get to setting up their handlers (because of error, etc) just seems prudent.

I guess this is observably equivalent. However I would imagine implementations will probably GC the stream or the XHR anyway, according to normal garbage-collection semantics, instead of adding exceptions for those objects to their garbage collector. In other words, there's no inherent reason that the memory lifecycle of an object should be tied to whether the socket connection gets closed.

As an implementer I think of "what do I need to ref to keep this alive". The current language does not give me anything to hold alive. I can of course create some behind-the-scenes object, but it just makes the spec more abstract to understand.

In fact, with streams there's a good reason to GC-but-not-cancel anyway, since the internal queue might be holding some amount of data, and it's a shame to keep that in memory when it's not necessary to do so. So in such a case it would be good to reclaim that memory, even if you can't reclaim the socket.

I'm really confused now. You say we cannot determine if handlers are registered or not, but you want us to GC the object those handlers are possibly registered on? I don't see how that can work.

If I can GC the thing that is going to fulfill the promise handlers, I see no reason to keep the channel open just to dump bytes into /dev/null.

I've probably just got myself turned about now, though.

I think the plan of record in previous discussions (and revisions of this repo) was something like

  const request = new Request({
    method: "POST",
    bodyWriter(writableStream) {
      writableStream.write(c);
      otherReadableStream.pipeTo(writableStream);
    }
  });

I'd like to hear more about that. The Request/Response object is essentially a pipe through to the .body read stream. Why not just let a ReadableStream get passed in which is then set as the .body stream? That requires zero bytes copied, no back pressure logic, no copy loop, etc.

I thought we only wanted to use WritableStreams for true data sinks.

domenic commented 9 years ago

As an implementer I think of "what do I need to ref to keep this alive"

Did you read through #15? I don't think this is about keeping the object alive. It's about whether or not you close the underlying socket connection.

I'm really confused now. You say we cannot determine if handlers are registered or not, but you want us to GC the object those handlers are possibly registered on? I don't see how that can work.

We went through this a bit in #15 too. It's pretty easy to create a situation where there are no references to the stream/reader, but the closed promise still has a handler registered on it, and a reference to the closed promise is held by the implementation. In that case you could still GC the stream, but you shouldn't close the connection because then you would need to fulfill the closed promise, triggering its handlers and making GC observable.

I thought we only wanted to use WritableStreams for true data sinks.

Why do you think an upload is not a true data sink? Underlying it is a socket you can call send(2) on.

wanderview commented 9 years ago

I had a few more minutes, but no need to respond today. :-)

Why do you think an upload is not a true data sink? Underlying it is a socket you can call send(2) on.

Because a Request object does not control where the data goes. A Request might be passed to fetch(), to cache.put(), or consumed directly by script. You can't call send(2) here because you don't know the Request is being passed to fetch.

Instead, I think the sink must be defined by the code consuming the .body stream. The fetch sink would write straight to a socket. The cache sink would write straight to a file (or more likely some IPC thing).

It just seems the WritableStream sink is a property of the thing consuming the Request, not the Request itself.

wanderview commented 9 years ago

We went through this a bit in #15 too. It's pretty easy to create a situation where there are no references to the stream/reader, but the closed promise still has a handler registered on it, and a reference to the closed promise is held by the implementation. In that case you could still GC the stream, but you shouldn't close the connection because then you would need to fulfill the closed promise, triggering its handlers and making GC observable.

From our IRC conversation it seems you see a socket implementation object holding a ref to the promise directly. I think this is really hard to get from the spec, though, because all socket operations like posting data, closing, etc require modifying other attributes on the stream as well. Its not clear at all that we're able to operate on the promise from this undefined socket object.

If we want this behavior I think it needs to be clarified in the spec. As it stands right now, all resolution of the closePromise has to go through the streams object itself first. That makes it hard to GC the object, but not the promise.

wanderview commented 9 years ago

The more I think about the GC discussion, the more it seems to me we're down in the weeds of implementation optimizations. Perhaps we should just loosen the language a bit to satisfy both our optimization goals?

I would be happy if we added text to the effect of:

The UA may terminate a fetch that has a locked reader if and only if the following two conditions are true:

  • The fetch has been suspended.
  • The UA can guarantee the fetch termination cannot be observed through the stream object or one of its promises.

The suspended state is in there both to satisfy my concern about streams that never complete and also to deal with streams being written to the http cache as a side effect.

wanderview commented 9 years ago

Actually, should we allow GC termination of a fetch in the case where the reader is not locked, but the data is possibly being written to the http cache? It seems the current proposal would terminate in this case and truncate the cache data.

domenic commented 9 years ago

Because a Request object does not control where the data goes.

I'll spin off another thread to discuss this.

From our IRC conversation it seems you see a socket implementation object holding a ref to the promise directly. I think this is really hard to get from the spec, though, because all socket operations like posting data, closing, etc require modifying other attributes on the stream as well. Its not clear at all that we're able to operate on the promise from this undefined socket object.

I agree, upon re-reading what I wrote in that thread and also upon thinking about how e.g. https://github.com/whatwg/streams/pull/311 will affect things. After that PR gets merged I expect that specs will operate by doing e.g. CloseReadableStream(s), which indeed requires retaining a reference to s. So, apologies for going down a confusing route...

The more I think about the GC discussion, the more it seems to me we're down in the weeds of implementation optimizations. Perhaps we should just loosen the language a bit to satisfy both our optimization goals?

Yes, that might work well, as long as we can get interoperability out of it.

The fetch has been suspended.

This seems bad. How do you know the stream never completes? Especially in the backpressure case, author code could start reading again at any time, alleviating the backpressure.

I guess, in what kind of code do you envision this ever coming up?

the stream object

probably the reader object is the more important one

yutakahirano commented 9 years ago

Thank you very much for the review, @wanderview!

As I mentioned in a previous comment, ReadableByteStream is not in the streams spec yet. It would be nice to at least link to the prototype.

Merged @domenic's note.

Request ConsumeStream step 4.3 says:

Unset locked flag.

Does this effectively mean that you can drain the Request with .text(), and afterwards the bodyUsed flag is false? It seems the used concept is tied to the lock state. The passed state also effects bodyUsed, but I don't think its called for the Body mix-in convenience routines. Can you explain why you have separate lock and passed flags? Do other specs like Cache and SW's FetchEvent need to set this passed flag as well?

It is based on a long discussion and @tyoshino's note in it.

Anyone reading a body stream should get a reader and read from it, theoretically. I added locked flag in order to keep consistency with the behavior as a workaround. In the future, I will ask such users to "set passed flag and get a reader from the stream, and use it".

Why doesn't Request have a body stream in its webidl? Or rather, shouldn't the ReadableStream body attribute be defined on the Body mix-in?

The current fetch algorithm has events notifying response download progress (process response body, process response end-of-file) and I use them to read the response body. But I cannot use the same technique for uploading: may be I need to modify the algorithm, or use Writable[Byte]Stream instaed: Hence I decided to postpone it.

I think it would be better to define the Uint8Array chunk type where you define the stream concept for the object. Placing the chunk type down in ConsumeStream, etc makes it a bit hard to find. Can you expand on what you means here?

Filed #31.

// The user agent must not terminate the fetch because there is an // active reader while it is not reachable by the garbage collector. fetch("http://www.example.com/").then(res => res.body.getReader())

I can kind of understand the rational due to the close handler allowing GC to be observed, but in this case there is no close handler set. Why not just require it to remain open if there are observable handlers in use?

Also, what does this mean for infinite streams or streams stuck on back-pressure? Does the UA just keep them open forever? I don't think we can rely on network timeout because the stream could be coming from an http cache file.

I think we have to have some way to GC these resources.

I reopened #15. Let's discuss there.

annevk commented 9 years ago

@yutakahirano in https://fetch.spec.whatwg.org/#concept-http-network-fetch there's process request body and then in https://fetch.spec.whatwg.org/#concept-fetch there's process request end-of-file (run once we have a response). Can those not be used for uploading?

yutakahirano commented 9 years ago

@yutakahirano in https://fetch.spec.whatwg.org/#concept-http-network-fetch there's process request body and then in https://fetch.spec.whatwg.org/#concept-fetch there's process request end-of-file (run once we have a response). Can those not be used for uploading?

They are uploading progress notification, right? If we have a Writable[Byte]Stream associated with the request body, we can push data with seeing the notification. That's what I wrote as "using WritableStream". On the other hand, if we want to read from the request.body which is a Readable[Byte]Stream, the fetch algorithm should call read() explicitly.

Does it make sense?

annevk commented 9 years ago

I see. So the problematic cases are:

var req = new Request(...)
req.body.getReader()

which does not go through Fetch and invoking req.body.getReader() in a service worker FetchEvent instance which would go through Fetch.

yutakahirano commented 9 years ago

Sorry, I was a bit confused, and would like to correct the previous comment.

As we don't have the entire request body when we call fetch(),

My understanding is that the second one is not process request end-of-file task. The existing one is notification for a fetch algorithm user, but what I want is notification for the browser from a user.

annevk commented 9 years ago

Isn't that just invoking close() on the writable stream?

yutakahirano commented 9 years ago

Isn't that just invoking close() on the writable stream?

From the user side, yes, calling close() is the signal to end-of-file. We need to translate it to the browser.

In other words, https://github.com/yutakahirano/fetch-with-streams#fetch-method provides the underlying source for downloading, and I plan to provide the underlying sink for uploading. The underlying sink needs to have a close operation, so the corresponding operation is needed at fetch algorithm level.

wanderview commented 9 years ago

The fetch has been suspended.

This seems bad. How do you know the stream never completes? Especially in the backpressure case, author code could start reading again at any time, alleviating the backpressure.

Sorry, I was trying to describe the state where the stream and reader are no longer observable and the fetch is suspended. If the script can't reach the stream object any more then it can't read and unblock the stream, right?

domenic commented 9 years ago

Sorry, I was trying to describe the state where the stream and reader are no longer observable and the fetch is suspended. If the script can't reach the stream object any more then it can't read and unblock the stream, right?

Ah OK, agreed. Not sure how that is different from

The UA can guarantee the fetch termination cannot be observed through the stream object [or the reader object] or [the reader object's closed promise].

though.

wanderview commented 9 years ago

@domenic, because if the stream has not been suspended then:

annevk commented 8 years ago

@yutakahirano it seems we resolved most v1 issues now one way or another. Do you want to write a PR or should I have a go at making changes to the Fetch specification? The current text you have seems to need some changes still, so I guess it's up to you whether you want to continue driving this or whether you'd like me to help out.

yutakahirano commented 8 years ago

@annevk Thank you for the discussion held in SF. I will fix the draft to reflect the conclusion.

domenic commented 8 years ago

@yutakahirano eventually though we want to modify the Fetch spec though right? So this draft won't be as relevant. I think @annevk's question was whether you want to do that directly with a PR, or whether he should work on it directly with your review.

yutakahirano commented 8 years ago

@domenic Thanks.

I would like to create PRs for the fetch spec.

domenic commented 8 years ago

@yutakahirano any updates on fetch-spec PRs?

yutakahirano commented 8 years ago

Sorry for the delay. I'm addressing #52. After finishing it I will update the whole draft again (to reflect recent fetch spec changes in order to make it easy to create PRs) and create PRs.

yutakahirano commented 8 years ago

All v1-blocking issues were addressed. I will reflect the recent fetch spec update to the draft, and then propose PRs.

yutakahirano commented 8 years ago

The initial merge is almost done. The last PR is under review. I planned to replace every "push" word on request / response body with "enqueue" as written in the draft, but I now think it is not necessary as the word is defined in the encoding spec.

If you find any problems, please let me know.

annevk commented 8 years ago

Perhaps we should change the wording in Encoding and Fetch to use enqueue rather than push? Or we could leave that for the longer term rewrite.

domenic commented 8 years ago

I think it would be nice. But it could be done with a follow-up.

yutakahirano commented 8 years ago

push / enqueue will be addressed as part of #62.