icing / mod_h2

HTTP/2 module for Apache httpd
https://icing.github.io/mod_h2/
Apache License 2.0
256 stars 41 forks source link

v2.0.10 causes curl "Error in the HTTP2 framing layer" while running git's test suite #243

Open peff opened 1 year ago

peff commented 1 year ago

Git's test suite added support a few months ago for running its http tests using http2 via apache. Since upgrading to mod_http v2.0.10 (via updating my Debian package of apache2 to v2.4.55), one test consistently fails. The issue is that on a larger fetch, libcurl fails after the protocol upgrade with:

error: RPC failed; HTTP 101 curl 16 Error in the HTTP2 framing layer

The last request is a POST. I've attached two traces from the git client side which show the actual sequence of requests. One with mod_http 2.0.9 that succeeds, and one with 2.0.10 which fails. On the server side, there's no error in the log at all. If I crank LogLevel to debug, there's lots of output but nothing that looks like an obvious problem to me. Here's an error.log for the failing case.

Bisecting within the mod_h2 repository, the problem starts at 16ffed9692b250ddc38ed1f7718cab85f8e1b597. It's possible the problem is on the Git or libcurl side, but given that this commit is just meant to be refactoring, it makes me suspicious that there's some unintended bug.

I'm sorry that I don't have a more minimal reproduction recipe, but here's what I used to bisect:

git clone https://github.com/git/git
cd git

# the knobs here aren't strictly necessary, but disable some dependencies; however you can
# get it built is fine, as long as it is built with libcurl support
make NO_EXPAT=Nope NO_TCLTK=Nope NO_PYTHON=Nope NO_GETTEXT=Nope

# this is the failing test; "-i" asks it to stop as soon as it sees the failure
cd t
./t5559-http-fetch-smart-http2.sh -i

# the stderr from the failing client, including the trace, is captured in this file
cat "trash directory.t5559-http-fetch-smart-http2/err"

There's also some discussion on the Git mailing list, but I've tried above to summarize everything useful.

I'm happy to investigate more on my end if you have suggestions.

icing commented 1 year ago

@peff: interesting. There are two issues in play here. The two logs start to deviate after the initial GET has been upgraded to HTTP/2.:

== Info: Connection 0 seems to be dead

where curl thinks the upgraded HTTP/2 connection is no longer open and tries another one. Since there are no time stamps in the logs, I cannot see what possibly causes this. A clean server log, possible with LogLevel http2:trace2 could shine a light on this.

After that, curl opens another connection - as it should - but there it only does POST requests. The first, with a Content-Length: 4 is not upgraded to HTTP/2 as Apache does not support that for requests with a body.

The second one carries no Content-Length but has a Transfer-Encoding: chunked. mod_http2 should not upgrade that one either, but does. That leads to the confusion about which protocol is actually spoken on the wire when the body bytes come in.

In mod_h2 v2.0.12 I fixed this and those requests will also not be upgraded. This might lead to your tests passing, but note that all requests there will be in HTTP/1.1. At least until one request without a body is made.

HTTP/1.1 Upgrade is a fickle thing. The more reliable way is to either use https or use HTTP/2 directly on a http: connection. For that you'd need to configure H2Direct on on the Apache server that should allow this. And for curl you'd need to use --http2-prior-knowledge. That might be more reliable.

peff commented 1 year ago

@icing Thanks both for taking the time to dig through the logs, and for your explanations. I did notice the "Connection 0 seems to be dead", but being fairly clueless about HTTP/2 myself, didn't realize the implications of causing an upgrade on the later POST.

I tried your v2.0.12, and it indeed fixes the test failure. Should we consider it a curl bug that it asks for upgrade on a request with a body? If so, I can report it there.

I'm attaching an error.log with http2:trace2 enabled. I tried to pare down the tests to just the failing invocation, so hopefully that should be less tedious to look through. I notice that the server side issues a GOAWAY with reason='timeout' (even though hardly any time has passed), and then there's a "Software caused connection abort" message. I won't pretend to know exactly what's going on, but it seems likely to be related.

Thanks also for your advice on H2Direct. For Git's tests, we don't care that much about making sure HTTP/2 is used for every request. We mostly treat libcurl as a black box that can do what it wants at the HTTP level, but we ran into some corner cases around tracing, which is what prompted the tests.

It is possible that we'd want to expose some way for users to set CURL_HTTP_VERSION_2_PRIOR_KNOWLEDGE, but I think in the real world everybody is using https anyway. It's only the tests that don't (we do have some code to set it up, but it's not enabled by default because of the expense and portability headaches). I did confirm that enabling it also fixes the test failure.