twitchtv / twirp

A simple RPC framework with protobuf service definitions
https://twitchtv.github.io/twirp/docs/intro.html
Apache License 2.0
7.19k stars 326 forks source link

Ensure response body is fully read on context errors #388

Closed nsafai closed 10 months ago

nsafai commented 1 year ago

Issue #, if available: https://github.com/twitchtv/twirp/issues/384

Description of changes: As mentioned in the GH issue above, the Golang docs specify that:

If the Body is not both read to EOF and closed, the Client's underlying RoundTripper (typically Transport) may not be able to re-use a persistent TCP connection to the server for a subsequent "keep-alive" request.

The original poster QueenCitySmitty mentions that

Both doProtobufRequest and doJSONRequest only defer closing the body, not also reading to EOF. This has the potential to cause issues with TCP.

marioizquierdo makes a good suggestion to do this only for context errors as it is:

[...] safer to implement, and will not cause a second read on the response body for regular successful requests.

rhysh commented 1 year ago

Can you say a bit more about when this would make a difference? It seems to me that once the Context value used for a Request expires or is canceled, the http.Transport will try to clean up any remaining state as soon as possible. For HTTP/1.1, that means it's going to close the TCP connection, thereby preventing its reuse. I'm not as familiar with this part of Go's HTTP/2 support, but I'd guess that the underlying TCP connection would be reusable with or without this change.

Is there an integration test that confirms the description of the problem, and that the proposed change solves the problem?

nsafai commented 1 year ago

Can you say a bit more about when this would make a difference? It seems to me that once the Context value used for a Request expires or is canceled, the http.Transport will try to clean up any remaining state as soon as possible. For HTTP/1.1, that means it's going to close the TCP connection, thereby preventing its reuse. I'm not as familiar with this part of Go's HTTP/2 support, but I'd guess that the underlying TCP connection would be reusable with or without this change.

We are correcting an oversight in the twirp implementation to always read response bodies to EOF for cases where we experience context errors - this better aligns the twirp implementation with the go docs:

// If the returned error is nil, the Response will contain a non-nil
// Body which the user is expected to close. If the Body is not both
// read to EOF and closed, the Client's underlying RoundTripper
// (typically Transport) may not be able to re-use a persistent TCP
// connection to the server for a subsequent "keep-alive" request.

Is there an integration test that confirms the description of the problem, and that the proposed change solves the problem?

AFAICT, there are no benchmark or integration tests in this package. Testing these changes is the difficult and time consuming part, and it's what we're working on right now.

rhysh commented 1 year ago

Thanks. I see that the code isn't meeting the letter of the net/http doc's requirements for connection reuse. I see that the current patch addresses only one of the ways the code doesn't meet those requirements. I don't see how it would matter in practice.

On the one hand, I have doubts that this is a real problem; it looks like the window (between no error from http.Client.Do, followed by error from ctx.Err) would "usually" be "quite small". And again, how would consuming the Body make a difference in whether the http.Transport is able to reuse the connection? It seems that either the Body is already sufficiently consumed for the Transport to have removed the association between the Request+Response.Body and the TCP connection (so failing to consume the whole body wouldn't make a difference in the reusability of the TCP connection), or the association is still there and so the Transport is also in the process of detecting that the Context has been canceled/expired and will (for HTTP/1.1) discard the TCP connection anyway.

On the other hand, if it's really important for us to consume the entire body on every code path (when http.Client.Do doesn't return an error), then maybe it's time to do that in a way that's more obviously complete (and which will remain complete as the code continues to evolve), rather than addressing these one at a time. The body is also not consumed when the response's status code is a redirect.

If the time has come to bring the caller of http.Client.Do into compliance with that method's docs, I'd advocate for making that alignment obvious and permanent rather than adding to the patchwork. If the goal is a targeted change to address a specific operational issue, then a demonstration / tests would help to drive that point home.

github-actions[bot] commented 10 months ago

This PR is stale because it has been open for 60 days with no activity. Remove stale label / comment or this will be closed in 5 days