yutakahirano / fetch-with-streams

Fetch API integrated with Streams.
66 stars 8 forks source link

Creating service worker responses from multiple readable streams, efficiently #59

Closed domenic closed 6 years ago

domenic commented 8 years ago

In a private thread @tyoshino, @yutakahirano, @jakearchibald and I we were working through how we would enable the use case of composite streams. An example would be a service worker that streams the header from cache, the body from a fetch, and the footer from cache. We wanted to bring the discussion into the open before it got too far along.

The current thinking on this design is that you would do new Response(body) where body is an author-created ReadableStream that you manually compose out of the other streams. @jakearchibald has an example of how to do this.

However, this kind of manual shuffling of bytes through the service worker thread is not optimal. Ideally we would be able to do this stream composition off the service worker thread without involving JS. One design for that would be something like what Takeshi proposed, which I added as a comment on Jake's gist.

A very similar, but not identical, topic has previously been discussed: #30. There we were talking not about service worker-side response creation, but about "client side" request creation. It is a long thread, but @wanderview convinced us to not use the writable stream design. His main argument is that Request is "time disconnected", i.e., you can create them at any time, and so it is not feasible to supply a bodyWriter writable stream that represents the upload, since at Request creation time the upload hasn't started. @yutakahirano sums this up by saying "Just exposing a writable stream via a revealer function doesn't automatically provide a direct connection between a user and the final sink."

And this criticism is still true, even for the service worker-side response. I can construct Response objects all day long. It is only when they get connected to the main thread via event.respondWith that there is some connection to the final sink. So, @tyoshino's example (i.e. my gist comment) is not really conceptually correct.

For this thread I would like us to investigate a few potential questions:

wanderview commented 8 years ago

Just to be clear, are you saying you don't want to just use a pipe for @tyoshino's example? Something like:

self.onfetch = event => {
  const bodyParts = [
    caches.match('/header'),
    fetch('/body'),
    caches.match('/footer')
  ].map(r => r.body);

  var pipe = makeAPipeSomehow();
  sequentialStreamWrite(bodyParts, pipe.writer);

  event.respondWith(new Response(
    pipe.reader,
    {
      headers: {'Content-Type': 'text/html'}
    }
  ));
};

function sequentialStreamWrite(sources, sink) {
  pipeNext(0);

  function pipeNext(i) {
    const source = sources[i];

    if (source) {
      source.pipeTo(sink, { preventClose: true }).then(() => pipeNext(i + 1));
    } else {
      sink.close();
    }
  }
}

I guess because it doesn't flow the "final sink" events back the way you want?

Personally, I think we need a separate progress event system separate from a writable stream anyway. I would prefer if we just did that.

domenic commented 8 years ago

My initial thoughts:

Is a writable stream our best bet for composite responses? Or are there alternate possibilities for solving this composite stream use case?

I think writable stream is the way to go. You could imagine a design that solves this problem in a one-off way, like ReadableStream.composite(...otherReadableStreams). But this is the kind of anti-EWM magic that always gets us in trouble. We are doing things that authors cannot do, solving only very specific use cases; when they need something more customizable, things will start falling down. E.g., what if you don't know the order ahead of time? What if you don't know how many readable streams there will be ahead of time? (These are both solvable, but I am trying to illustrate the general principle that there are a lot of unanticipated and probably unanticipatable possibilities.)

The writable stream also just seems like the more primitive thing, that more closely represents what's going on. Especially for the upload case.

What would a good design look like for exposing the "final sink" for the service worker use case? It can't be on the Response object, as @wanderview points out. But then where would it live?

It seems to me that what we're trying to expose is some kind of writable destination corresponding to the current service worker fetching cycle. So, maybe there is something like a writable event.responseBody? But then how do you get in the headers? Hmm.

Maybe a different design is to say that the "time-disconnected" Response object is just a convenient package of data you can use to feed to the "real" respondWith. So you have something like e.realRespondWith(responseInit, (bodyWritableStream) => ...), and then e.respondWith(response) is sugar for this.realRespondWith(response, (bodyWritableStream) => response.body.pipeTo(bodyWritableStream)).

This seems like it might also extend to fetch uploads (fetch vs. "real fetch"), but I don't want to discuss that too much until I see some reactions to the rest of my point of view.

domenic commented 8 years ago

Just to be clear, are you saying you don't want to just use a pipe for @tyoshino's example? Something like:

Oh, this is pretty nice. It seems pretty optimizable too. Maybe I wrote all this up for nothing :).

@tyoshino, @yutakahirano, what do you think? I guess it means that these "pipe" instances (i.e. identity transform streams) would have to be implemented specially, so that the engine knows to skip them. That seems OK though.

jakearchibald commented 8 years ago

what if you don't know the order ahead of time

This isn't an edge case either. I may push a header from the cache, then see that the body request failed, and push a different stream.

event.responseBody

This could be a null transform stream, meaning you could do new Response(event.responseBody). But I haven't really understood why the revealing constructor pattern doesn't work here, so probably talking nonsense.

jakearchibald commented 8 years ago

"Just exposing a writable stream via a revealer function doesn't automatically provide a direct connection between a user and the final sink."

Could the revealer function be called when there is a direct connection?

domenic commented 8 years ago

@jakearchibald

But I haven't really understood why the revealing constructor pattern doesn't work here, so probably talking nonsense.

Here's the basic problem:

self.onfetch = event => {
  const responses = [
    new Response(bodyWriter1 => ...),
    new Response(bodyWriter2 => ...),
    new Response(bodyWriter3 => ...)
  ];

  event.respondWith(responses[Math.floor(Math.random(3))]);
};

Which of these bodyWriters represent the actual sink for data sent to the page?

Could the revealer function be called when there is a direct connection?

In #30 we discussed very similar problems. One proposal was indeed to defer calling the body writing function until respondWith time or upload time. But that was deemed undesired for a variety of reasons... I should probably go re-read the thread to re-remember...

jakearchibald commented 8 years ago

Cheers! I should go and read #30 and stop derailing.

wanderview commented 8 years ago

Could the revealer function be called when there is a direct connection?

In #30 we discussed very similar problems. One proposal was indeed to defer calling the body writing function until respondWith time or upload time. But that was deemed undesired for a variety of reasons... I should probably go re-read the thread to re-remember...

I believe it was just insanely complex and getting hard for anyone to understand.

Personally, I don't see the advantage of the revealer function over the pipe.

I guess it means that these "pipe" instances (i.e. identity transform streams) would have to be implemented specially, so that the engine knows to skip them.

I think you could only do this for zero buffer pipes. In theory content code should be able to create a pipe with a specific buffer size.

yutakahirano commented 8 years ago

@tyoshino, @yutakahirano, what do you think?

Looks good, but IIRC pipe optimizability was unclearer than we expected when we discussed #30. Is it clear now?

domenic commented 8 years ago

Looks good, but IIRC pipe optimizability was unclearer than we expected when we discussed #30. Is it clear now?

I think it is not 100% clear but it should be doable. All the pieces are in place: pipeTo using non-public APIs, plus locking, should allow all three streams in the readable -> transform writable side -> transform readable side to have no way to interfere or observe the inner workings of the pipe process.

The only real open question is how we signal the identity transform case. I think it would work to allow new TransformStream(). For example, you could implement TransformStream with a delegating pimpl pattern so that when it sees no arguments, it delegates to a straight pass-through pimpl, whereas if it sees non-default arguments, it delegates to a pimpl that uses those functions. Alternately we could have a dedicated new Pipe() or TransformStream.identity() or similar, but I don't think it would be necessary.

tyoshino commented 8 years ago

First, to be clear, I just raised concern about optimization in the private thread before this issue. I remember #30 and agreed that we start with ReadableStream based Request construction. I'm fine with that also for Response construction.

I was going to investigate if the approach as Ben described in https://github.com/yutakahirano/fetch-with-streams/issues/59#issuecomment-147770695 work in https://github.com/whatwg/streams/issues/359, but not yet started.

tyoshino commented 8 years ago

Sorry for providing a stale (with revealing constructor) example.

jakearchibald commented 8 years ago

Here's how it'd look using a transform stream / pipe https://gist.github.com/jakearchibald/18562306e6cbbf975009

Much cleaner!