yutakahirano / fetch-with-streams

Fetch API integrated with Streams.
66 stars 8 forks source link

Uploading: how to handle redirects #66

Closed domenic closed 7 years ago

domenic commented 7 years ago

I see two options:

In practice I think disallowing upload redirects is not that bad. Redirects for uploads seems pretty weird to me... We could also do that as an initial version and then if there is developer need, loosen the restriction later?

annevk commented 7 years ago

It's not just redirects. It's also authentication. You'd have to set window to null too. There's a note around the place where we tee streams that implementations could optimize for the no-window/no redirects scenario.

annevk commented 7 years ago

How complicated will teeing be by the way?

(It's annoying that if you want to reduce memory usage you have to pass redirect:"error" and window:null. Ideally we have something simpler for that. But perhaps requiring those to be passed when you pass in a stream is okay for a v1.)

domenic commented 7 years ago

Can you explain the window thing? I am not familiar with that part of fetch and what it impacts.

I think teeing is probably not that complicated from a spec perspective. But it might be complicated from an implementation perspective, or at least, complicated to optimize. @yutakahirano had more thoughts on this.

And yeah, I mainly wanted to see how people felt about making v1 a bit more annoying. If we can figure out a way to avoid that (e.g. teeing) that's even better.

annevk commented 7 years ago

It impacts HTTP authentication dialogs. A user agent can redo a request when given a 401/407 response. Setting window to null means there's no place to display the dialog and therefore the request cannot be redone.

annevk commented 7 years ago

The logic is near the end of https://fetch.spec.whatwg.org/#concept-http-fetch and first step of https://fetch.spec.whatwg.org/#concept-http-network-or-cache-fetch.

yutakahirano commented 7 years ago

Sorry for the late response.

  • Do as the spec does now, and tee the stream so we can handle redirects. A bit complicated...
  • Only allow uploads if the redirect mode is "error".
    • Automatically set it to "error" if they supply a stream, or require them to opt-in?
    • Should we allow "manual" also?

If it's possible to specify when the stream can be send-able without teeing, I would prefer the second option, at least for now. If users want to send a request with RS without such limitation, we can consider the other option at that time.

annevk commented 7 years ago

Do you think we can? E.g., what about 401/407?

yutakahirano commented 7 years ago

Hmm, looks impossible.

Talked with @tyoshino.

Can we propose a new HTTP status code (say 102) which means there is no more redirect and authentication (and more?)? IIUC 100 is not enough. Do you think it's a good idea? How hard is it to propose such a status code to IETF?

annevk commented 7 years ago

@mnot @reschke I believe you have experience with registering status codes. 😊

The idea would be, if I understand correctly, that we don't stream the request body until the server guarantees with a 102 that the eventual response won't be a redirect or authentication dialog. If it doesn't send a 102 we'll just not transmit the body. If it does send a 102 and then doesn't comply it's a network error.

We might also offer the more expensive tee-all-the-things option, but the above would be much better as a default.

domenic commented 7 years ago

I think I would prefer to tee all the things, or disallow authentication. Causing a roundtrip tax on all from-JavaScript uploads just because a small minority of cases might need authentication seems like a bad tradeoff.

annevk commented 7 years ago

If the server transmits early there would not be much of a tax, if at all. And teeing is a pretty big tax.

wanderview commented 7 years ago

Sorry if I missed this, but how does the server respond with anything before the upload body is sent?

Edit: Is this possible with chunked encoding?

yutakahirano commented 7 years ago

The UA can stop teeing after it sees 102 even with the "tee-all-things option".

yutakahirano commented 7 years ago

@wanderview I imagined something like 100 Continue. The server can send some information before sending the final response.

reschke commented 7 years ago

It's not that complicated to define a new status code (but 102 is taken, btw, see http://www.iana.org/assignments/http-status-codes/http-status-codes.xhtml)

However, I'm both skeptical about whether a code is needed, and the likeliness of a server to get this right.

Is there a summary of what problem this would solve? How likely are large uploads that happen before any other request that would trigger redirects or auth?

Finally, how would the new status code be different from 100 (Continue)?

domenic commented 7 years ago

Having thought about this more over the last few days, I maintain that it's not really worth working on this feature if it requires server-side opt in. All the developers we've heard from want to use this uploading functionality on preexisting servers, in most cases outside their control.

So while we could consider minting a new status code (or reusing 100) as some kind of optimization for those that control both sides of the pipeline, I think the initial version of this feature really can't make that a requirement. So either it needs to tee or it needs to fail on redirects and on authenticated endpoints.

yutakahirano commented 7 years ago

@reschke Thank you, then let's use 199 in this thread. The problem is that we need to copy the request body to resend it in case redirect / authentication is required until the response arrives. It leads to a memory bloat. IIUC 100 is not enough - the server can serve 307 after sending 100.

Even when the server doesn't trigger redirects, the client cannot know that until it receives the final response. If the server wants to see the request body to decide the response code (200 or 404, for example), the client needs to remember the entire body.

annevk commented 7 years ago

All the developers we've heard from want to use this uploading functionality on preexisting servers, in most cases outside their control.

This is not true. All the folks wanting to use this to replace their WebSocket stacks have significant control over both ends of the stack and have certainly made noise about wanting this.

Furthermore, we have tried to make the default API for fetch() to be the one that consumes the fewest resources. It would be nice if the least complicated API for uploading was also the most efficient one.

We could still ship with a less efficient model as a first step, but not until we know what the more efficient model would look like so we can preserve the aforementioned aspects of the API.

I tried to think about the 199 model a bit and I'm not entirely sure it helps. Ultimately the client needs to make a decision here about memory and latency.

I think a better model would be that the client requires 199-like semantics or explicitly decides to tee.

If the client requires 199-like semantics, the client can preserve memory usage and start sending the body directly (reducing latency; disregard my earlier comment on this). If the server ends up sending a 401/407/redirect, then we result in a network error. The client should have figured out somehow with the server beforehand that the 199-like semantics can be required. 407 (aka a porxy) makes this tough, but so be it.

API-wise I think it would make the most sense if the explicit teeing instruction was bound to the stream that is passed to fetch() in some way, but we could also add a flag to the fetch() API.

I'm not entirely sure how it should interact with redirect mode. I'm somewhat leaning towards ignoring it in the default non-teeing case and not requiring it to be set to "error" or "manual".

tyoshino commented 7 years ago

Let me summarize proposals as usual.

I'd define a new term here, "resource status", that is statuses that don't require any User-Agent level action such as authentication handling, redirect handling, etc. and therefore are delivered to the application layer. In https://github.com/yutakahirano/fetch-with-streams/issues/66#issuecomment-241350041, Yutaka said 100 is not good. That's because non resource status may follow 100. Only resource status follows 199.

UA could start sending body

I understand Julian's comment (https://github.com/yutakahirano/fetch-with-streams/issues/66#issuecomment-241209933) but not sure if it's reliable that we issue preflight for completing auth given load balancer, etc. So, I'm feeling that we should have (a-2) (and (a-3)), not only (a-1).

(a-2) and (b) requires the server to decide what status code to return before receiving all the body data. If it's impossible, the server could first send 200 and deliver the actual status code later in application level by encoding the status into the body. This might be related to the HTTP trailer API discussion.

(a-3) and (c) requires a new status code and server's opt-in. Solves https://github.com/yutakahirano/fetch-with-streams/issues/66#issuecomment-241350041. The client needs to declare capability of 199 ("Expect: 199"?).

(b) and (c) makes latency and memory consumption of fetch() competitive to WebSocket protocol/API. (a-1) gives us better latency than WebSocket.

Initially I was feeling that the default behavior for streams should be (2) (and (3)) for consistency with other data types. But I don't have strong opinion now.

So, the options would be

199 support can be added later to enable (a-3) and (c). The client would then start sending "Expect: 199".

reschke commented 7 years ago

FWIW, we removed "Expect" from HTTP in RFC 7231 except for the already defined 100-continue.

Also, for the server to be able to send a new 1xx code, the client does not need to opt-in. If sending the 1xx breaks the client, the client needs to be fixed.

annevk commented 7 years ago

@reschke for the "send-body-when-ready" option you need client opt-in of some kind (otherwise the server does not know it needs to transmit an early reply), but maybe we could define a Fetch-specific header for that, since either a new 1xx or an early status code from the server would do.

reschke commented 7 years ago

What keeps the server from always sending the new 1xx code?

Anyway, this design with client opting asking for the code, and server opting in seems to be identical to the one that was tried before with 100 Continue and which didn't work well in practice -- mainly because the server stacks didn't allow the applications to properly do the 1xx handling (there was a long servlet API spec discussion about that).

annevk commented 7 years ago

The server always sending it would be redundant with the case where the client transmits without teeing or the server can transmit a response status early.

Anyway, it seems we have enough of an idea without a status code so we could consider that again later.

(Pointers to previous discussions welcome.)

yutakahirano commented 7 years ago

What do you think about decoupling the discussion?

  1. Add a means to create a Request object with ReadableStream to the fetch spec. The request body will be teed until the response status is available ((a-2) in https://github.com/yutakahirano/fetch-with-streams/issues/66#issuecomment-241393392).
  2. Talk about other options ("send-body-when-ready" (b), "send-body-without-tee" (a-1) in https://github.com/yutakahirano/fetch-with-streams/issues/66#issuecomment-241393392) and how to give such options.
  3. Talk about 199 ((a-3) and (c) in https://github.com/yutakahirano/fetch-with-streams/issues/66#issuecomment-241393392).

They can be done in parallel.

annevk commented 7 years ago

I think we should consider "send-body-without-tee" and "" (aka "send-body-with-tee") at the same time, as the former should have the simpler API due to it consuming less resources and therefore being preferable. We don't necessarily have to ship the simpler API first, but we should have it figured out. Everything else can wait.

yutakahirano commented 7 years ago

Hmm, OK. I think it's good to choose "send-body-with-tee" as the default option, because it's the easiest one to use of three proposed options (Note: I changed my mind from https://github.com/yutakahirano/fetch-with-streams/issues/66#issuecomment-239467945 assuming we will be able to rely on some optimization such as 199 in the future).

annevk commented 7 years ago

I think if we do teeing by default we're not being consistent. Elsewhere we require clone() to be invoked to get duplication and an increase in memory usage. I think we should have something similar here. Or at least force it as a choice somehow.

domenic commented 7 years ago

I think if we do teeing by default we're not being consistent.

Hmm, can you explain? When we stream uploads that are not created by page authors, we tee, and get the duplication and increase in memory usage. If we're shooting for consistency, I think being consistent with this platform behavior is the most important factor.

As @yutakahirano says, that is the easiest to use. That is presumably why we automatically tee in those cases. I don't see why it should be different for author-created streams.

annevk commented 7 years ago

Whenever we had the option so far, we opted for better performance. I think we should continue to aim for that.

Could be interesting to do some telemetry to see how often teeing by default is wasted/used.

giveme6months commented 7 years ago

Have to agree

yutakahirano commented 7 years ago

Sorry for the late response. Thank you, then I am updating https://github.com/yutakahirano/fetch-with-streams/issues/66#issuecomment-242300376:

  1. Add a means to create a Request object with ReadableStream to the fetch spec. The fetch will be terminated when redirect or authentication are required ((a-1) in https://github.com/yutakahirano/fetch-with-streams/issues/66#issuecomment-241393392).
  2. Talk about other options ("" (a-2) (but let's say "send-body-with-tee"), "send-body-when-ready" (b) in https://github.com/yutakahirano/fetch-with-streams/issues/66#issuecomment-241393392) and how to give such options.
  3. Talk about 199 ((a-3) and (c) in https://github.com/yutakahirano/fetch-with-streams/issues/66#issuecomment-241393392).
domenic commented 7 years ago

Notes for F2F discussion:

Possible solutions:

jakearchibald commented 7 years ago

F2F:

annevk commented 7 years ago

Also, we decided not to provide opt-in for automatic handling of redirects / authentication right now, and not do a new status code. We can do those as needed later on.

annevk commented 7 years ago

Also, we decided to allow redirects for 303, since that drops the body and changes the method to GET reliably. (Doing the same for 301/302 when the method is POST could also be done, but seems somewhat less clean.)

yutakahirano commented 7 years ago

So the conclusion is that the first release should be (a-1) in https://github.com/yutakahirano/fetch-with-streams/issues/66#issuecomment-241393392, right?

annevk commented 7 years ago

@yutakahirano yes (with 303 as an exception since it has simple semantics).

yutakahirano commented 7 years ago

Thanks.

jakearchibald commented 7 years ago

@annevk

Doing the same for 301/302 when the method is POST could also be done, but seems somewhat less clean

Isn't the rule just "if the body is needed a second time, it fails"?

annevk commented 7 years ago

The problem is that what browsers do for 301/302 is somewhat against the spirit of HTTP. So further enshrining it here does not seem great.

annevk commented 7 years ago

This can now be closed.