Excess copying in current spec?

domenic commented 7 years ago

I am moving this concern from https://github.com/w3c/web-platform-tests/pull/3595#issuecomment-247406882 where @youennf wrote some helpful tests that illustrated the problem. Consider:

var response = new Response(new ReadableStream({ start(controller) {
  controller.enqueue(initialBuffer);
  controller.close();
}}));
var stream0 = response.body;
var clone = response.clone();
var stream1 = response.body;
var stream2 = clone.body;

Per spec there will be two copies made of initialBuffer: one in response.body, and one in clone.body.

Stated another way, there will be three distinct body streams (stream0, stream1, and stream2). stream0 gets teed to create stream1 and stream2, but the teeing is done with shouldClone = true, which creates two copies. See cloning a body.

I think we should revise things in one of two ways:

No copying at all. The same initialBuffer gets shared by both stream1 and stream2. This would mean just changing it to shouldClone = false.
Only one copy. This would mean introducing a new mode to ReadableStreamTee which only clones in branch 2. We would expect this to be only used by specs that are performing a "clone a stream" operation, and not used by specs or user code that are doing a "tee a stream" operation.

No copying at all is a bit dangerous, as it means that if you pass branch A to somewhere that transfers or otherwise detaches its chunks, branch B will get screwed. Right now as specced service worker performs that kind of detaching, presuambly because it wants to be able to do efficient transfering. So no copying would break things unless we also updated service worker.

Stating the last paragraph another way: given a scenario like cloning a response, sending branch A to the service worker and branch B to the cache, there must be at least one copy. (And preferably exactly one for efficiency.) As specced currently, "sending branch A to the service worker" does a transfer, not a copy. So we have two alternatives:

No copying when cloning, and switch service worker from transfer to copy
Copying when cloning, and service worker keeps using transfer

Given the choice between these two I lean toward one copy when cloning.

If we can get sign-off from, say, @youennf, @yutakahirano, @wanderview, and @annevk, I'd be happy to send PRs to streams and to fetch to fix this.

youennf commented 7 years ago

Adding a new clone operation would make sense to me. Ideally, we would like to create one new stream for the cloned body, without creating one for the being-cloned body. It seems rather unexpected that response.body object is not the same after cloning response and that the initial body object be no longer usable.

Not sure though how that could be fitted with current tee infrastructure. Also, if there is no other forecasted user of shouldClone=true teeing than fetch, should this mode be removed?

yutakahirano commented 7 years ago

No copying when cloning, and switch service worker from transfer to copy

Copying when cloning, and service worker keeps using transfer

Given the choice between these two I lean toward one copy when cloning.

The latter works for me, too. I was worried about breaking the symmetry (i.e. modifying the original buffer only affects one branch), but in terms of clone() they are not symmetric.

annevk commented 7 years ago

We also need this for uploads, right? They are defined in terms of a tee at the moment. That would mean each time you upload something, theoretically you get 3 streams and then +2 streams for each additional redirect? I always thought teeing was how streams called cloning, but if that's not the case I agree that we should do something else here.

I think I also assumed when reviewing the specification changes that streams1 was "free" somehow, but it obviously isn't.

yutakahirano commented 7 years ago

Regarding the teeing for uploading, I think its performance is not important. The teeing is not visible for users so implementors can omit teeing if possible. Chrome currently doesn't use any streams related operations on uploading. In terms of a Request made from a ReadableStream, we won't tee.

domenic commented 7 years ago

Not sure though how that could be fitted with current tee infrastructure.

I don't remember exactly how we arrived at the conclusion that request.body needs to change before and after, but I think it was very important for reasons unrelated to just reusing the teeing infrastructure. We should try to dig those reasons up and then probably put them in the spec as a note for future reference.

Also, if there is no other forecasted user of shouldClone=true teeing than fetch, should this mode be removed?

Yeah, that's probably reasonable.

domenic commented 7 years ago

Fixed by https://github.com/whatwg/streams/commit/0b4ab85f40626a70e22242cc60d4b745d3f65d73. Thanks all!

annevk commented 7 years ago

I think I missed something. https://github.com/yutakahirano/fetch-with-streams/issues/67#issuecomment-251969842 does not seem addressed or do we now know the reason? Also, OP suggested changes in Fetch, are those no longer needed?

domenic commented 7 years ago

See https://github.com/whatwg/streams/pull/528#issuecomment-252954609 for the reason. And yeah, no changes needed in fetch because the streams change sneakily changed the meaning of passing "true" for the second argument.

youennf commented 7 years ago

Cloning the internal queue should not be difficult. What seems to be important is to keep the pair source/controller together, at least for pure JS streams. That is probably less of an issue for fetch native streams.

I guess there may be ways to clone a stream by cloning the internal queue and moving the controller of the being-cloned stream to the stream that would be the source of both being-cloned and cloning streams, thus reusing part of tee infrastructure.

That would be more natural for fetch. The changed-body-after-clone is still not looking very appealing to me.

domenic commented 7 years ago

Right. You could rearchitect streams so that each stream holds a list of all the places it's been cloned to, so that enqueuing in the stream structured-clones-and-enqueues in each of the clones in the list. But this seems much more cumbersome than just sticking with the no-cloning model where reading is destructive. That fact is at the heart of how readable streams work.

IMO the real awkwardness comes in request having a clone() method instead of a tee() method. But we decided on that long ago as an important convenience. I think having .body change identity is a small price to pay to bridge the large conceptual gap between those two situations.

annevk commented 7 years ago

We could add tee()? Although would the only difference be that body does not change identity or would there be other benefits too?

youennf commented 7 years ago

I was not proposing such changes. I was envisioning something closer to:

Create a new stream
Move the controller slot from the being-cloned stream to the new stream
Tee the new stream
Move the controller of the tee1 stream to the being-cloned stream
Return the tee2 stream

youennf commented 7 years ago

I did a quick experiment and above approach seems to work. This may have side effects (if source keeps a reference to the stream it is attached, to store things in it e.g.), but not worse to me than the current case of disappearing-after-being-cloned bodies.

domenic commented 7 years ago

@youennf I want to say sorry for not responding to your idea. It is really interesting and I want to think about it more in-depth soon. My first reaction is that it's really bad and strange; controllers were never designed to be transplanted like that, and it breaks my model of streams where reading is always destructive and the only way you can clone is to read. But that's just a first reaction and I want to spend some time analyzing it more fairly; maybe it's just really innovative and I need to wrap my head around it. I'll try to make time for that next week.

youennf commented 7 years ago

Sure, I understand this is departing a bit from the current model. Introducing a new operation like that may break stuff (now or in the future) so we need to be careful. If that proves to be feasible though, I think it would make the API a bit better.

yutakahirano / fetch-with-streams

Excess copying in current spec? #67