python-hyper / hyper

HTTP/2 for Python.
http://hyper.rtfd.org/en/latest/
MIT License
1.05k stars 192 forks source link

Come up with an API for 1XX responses. #71

Open Lukasa opened 10 years ago

Lukasa commented 10 years ago

This is not something httplib supports, so we have nowhere to take our lead from. This means we can do this right (or at least right-ish).

So what's our idealised API? /cc @sigmavirus24 for his brains, @shazow as the likely primary consumer of the API.

sigmavirus24 commented 10 years ago

I don't know the draft well enough. Let me go read that and then I'll come back with ideas.

shazow commented 10 years ago

Yea same... I'd love to see some potential code examples of what such an API might look like, and give feedback on that. :)

sigmavirus24 commented 10 years ago

I don't see any references to 100-continue in the draft linked in #68. Will it work the same way as it does in HTTP/1.1 where it is an interim (non-authoritative) response? If that's the case, I'm not sure we need a special API but I also don't think we should swallow/ignore it like httplib. I'll also work on the assumption that in HTTP/2 the client doesn't actually have to wait to receive the 100 Continue status ([like it's described in HTTP/1.1).

I wonder if an improvement would involve stealing urllib3's Timeout class so the user could specify the amount of time they want to wait for the server to respond with 100 Continue after which hyper then just proceeds to send the request. (Obviously if the server responds with 100 Continue before the timeout triggers, then we wouldn't wait the entire timeout.) At the same time, I'm not sure if there's a good way to integrate this with the current API or if there's a better way to implement this without inspecting headers set by the user.

The advantage to this way of doing this way is that the API is kept simple and flexible. Meanwhile, I'm unsure how to represent to the user that it received a 100 Continue. At the same time, if we handle everything correctly I'm not sure the user needs to know.

Lukasa commented 10 years ago

Yeah, 1XX is not special in HTTP/2, it behaves exactly as it does in HTTP/1.1. This is therefore an opportunity for us to work out how we'd like provisional responses to work.

First, note that 100 is not the only provisional response. The 100 Continue flow is an interesting one, but we may well want to expose the possibility of acting on other responses.

One API option is to have the user handle the 100 Continue flow themselves. That would lead to an API a bit like this:

# Build the request but don't send any body data.
c = HTTP20Connection('google.com:443')
c.putrequest('POST', '/some_endpoint')
c.putheader('Content-Type', 'application/octet-stream')
c.putheader('Content-Length', len(big_data))
c.endheaders()

# Wait for a provisional response for 5 seconds.
resp = c.getresponse(timeout=5)

if resp is None or resp.status == 100:
    c.send(big_data)
else:
    # Handle final status code, probably an error.

This method has the advantage of allowing for other 1XX status codes, but forces the user to manage the entire flow. I don't know that that's a problem for me. We'd need to rework HTTP20Connection.request to work better here, or maybe have it handle the 1XX flow itself. Not sure. Thoughts?

sigmavirus24 commented 10 years ago

Not to be a pain, but I assume you also meant to include c.putheader('Expect', '100-continue').

I think this design would work fine for anyone using hyper directly. I guess urllib3 would do some header inspection at this point? (e.g., if headers.get('Expect', '') == '100-continue': # Wait for 100 response.)

Lukasa commented 10 years ago

I did, yeah. =P

I think that's correct. Hyper's job is really to make it possible to handle provisional responses, whereas urllib3's job may be to abstract them away. Does that seem reasonable, @shazow?

(A side note: we can also handle trailers, which are headers that come after a chunked body has been received. Right now they're just transparently merged with the original headers, but it might be better to provide an alternate way to receive them. Any preferences?)

shazow commented 10 years ago

Yea sounds like a decent low-level interface, we can always add more abstractions on top of it.

shazow commented 10 years ago

(A side note: we can also handle trailers, which are headers that come after a chunked body has been received. Right now they're just transparently merged with the original headers, but it might be better to provide an alternate way to receive them. Any preferences?)

Maybe combined headers live in .headers, and then you have .headers_pre and .headers_post or something?

sigmavirus24 commented 10 years ago

Maybe combined headers live in .headers, and then you have .headers_pre and .headers_post or something?

I was thinking along the same lines, but was thinking of only having .headers and .trailers where the difference would be the equivalent to .headers_pre. Seem reasonable?

shazow commented 10 years ago

So they would not be combined? I guess that can work. On Aug 11, 2014 11:30 AM, "Ian Cordasco" notifications@github.com wrote:

Maybe combined headers live in .headers, and then you have .headers_pre and .headers_post or something?

I was thinking along the same lines, but was thinking of only having .headers and .trailers where the difference would be the equivalent to .headers_pre. Seem reasonable?

— Reply to this email directly or view it on GitHub https://github.com/Lukasa/hyper/issues/71#issuecomment-51820727.

sigmavirus24 commented 10 years ago

Sorry that wasn't clear. .headers would be as you described ("regular" headers + trailers). .trailers would be the trailers that Hyper received. To get the "regular" headers, you would exclude anything in .trailers from what you find in .headers. I would expect it would not be too terribly difficult to figure that out but y'all should correct me if I'm wrong.

shazow commented 10 years ago

How does excluding work if there are duplicates? I'd keep them separate. Maybe have a combined one in addition, either way. On Aug 11, 2014 12:52 PM, "Ian Cordasco" notifications@github.com wrote:

Sorry that wasn't clear. .headers would be as you described ("regular" headers + trailers). .trailers would be the trailers that Hyper received. To get the "regular" headers, you would exclude anything in .trailers from what you find in .headers. I would expect it would not be too terribly difficult to figure that out but y'all should correct me if I'm wrong.

— Reply to this email directly or view it on GitHub https://github.com/Lukasa/hyper/issues/71#issuecomment-51831271.

Lukasa commented 10 years ago

I think the trailers and headers are logically a single header block. This means you should only be able to duplicate headers that can be represented as comma-separated lists (and the Set-Cookie header).

With that said, I'd be leaning towards having headers and trailers separate, and then have a simple property that combines them.

shazow commented 10 years ago

There will always be that one edge case where having the original format is meaningful. Be careful mangling the returned data too much, lest your code turns into httplib. :p On Aug 11, 2014 1:50 PM, "Cory Benfield" notifications@github.com wrote:

I think the trailers and headers are logically a single header block. This means you should only be able to duplicate headers that can be represented as comma-separated lists (and the Set-Cookie header).

With that said, I'd be leaning towards having headers and trailers separate, and then have a simple property that combines them.

— Reply to this email directly or view it on GitHub https://github.com/Lukasa/hyper/issues/71#issuecomment-51838725.

sigmavirus24 commented 10 years ago

So .headers, .trailers, and .all_headers (or .combined_headers)?

Lukasa commented 10 years ago

Maybe.

My issue there is that for non-expert users .headers is an attractive nuisance that may be missing useful information available in .all_headers. I'm not sure of a good way to handle this API without being really awkward.

For that matter, it seems like trailers are treated conceptually as part of the headers for a request. That's weird: they aren't headers in any sense.

Here's another question: if you request the equivalent of .all_headers before you've read any of the body, in HTTP/2 we'll need to consume the entire request body before we know for sure that there aren't any to come. That makes this a really magnificent footgun.

This entire discussion makes me want to back away from ever showing the combined set, and to simply have .headers and .trailers and then provide utility functions for combining the two.

sigmavirus24 commented 10 years ago

I think that API seems best.

shazow commented 10 years ago

+1, have higher level libs merge them.

On Tue, Aug 12, 2014 at 8:30 AM, Ian Cordasco notifications@github.com wrote:

I think that API seems best.

— Reply to this email directly or view it on GitHub https://github.com/Lukasa/hyper/issues/71#issuecomment-51930920.

Lukasa commented 9 years ago

Further thought on this makes me wonder whether the Expect: 100 Continue flow should actually be special-cased by hyper. Something like:

c = HTTPConnection('somefileuploadsite.com')
c.request('POST', '/myfile', body=massive_file, expect_continue=True)
r = c.get_response()
assert r.status_code == 200

Under the covers, hyper did the 'wait for 100 response' logic and sent the body onwards.

The advantage of this API is that the user doesn't need to worry about the exact semantics of the 100 response. All other provisional responses will be returned as normal, and the user will (as normal) be allowed/expected to retrieve the socket object if they need it.

sigmavirus24 commented 9 years ago

So, here's the fun thing. The spec (last I checked) said to wait for a 100 Continue or a reasonable amount of time. What if expect_continue allowed for a float that was essentially "After x seconds, send the file anyway"

Lukasa commented 9 years ago

Ooh, that's a good idea. It'll also allow for true which will choose a reasonable time (something like 1 second), but can take the float.