Closed mattiasgrenfeldt closed 3 years ago
Interesting issue.
AFAIU, we really can't make any new requirements on servers that only support 1.0.
Julian is correct; in this specification we can't retrofit new requirements onto HTTP/1.0 intermediaries.
However, note that the RFC-to-be says:
- If a message is received with both a Transfer-Encoding and a Content-Length header field, the Transfer-Encoding overrides the Content-Length. Such a message might indicate an attempt to perform request smuggling (Section 11.2) or response splitting (Section 11.1) and ought to be handled as an error.
... so the server in your scenario is already encouraged to handle that as an attack.
Is that sufficient? Arguably we could increase that 'ought' to a SHOULD or even a MUST.
@mattiasgrenfeldt ?
Sorry for the late response.
In the 6 proxies and 6 servers we have looked for Request Smuggling in, only one server rejects requests with both Content-Length and Transfer-Enconding. So even though the current spec warns of an attack, that advise is generally not followed.
Changing the spec to reject requests with both headers would be very good for stopping Request Smuggling issues.
But we can only speak from the Request Smuggling point of view. There might be other reasons to not make that change.
(Asta Olofsson & Mattias Grenfeldt)
So the spec already warns about that case. Do you really believe if another change in the spec would affect those implementations that do not follow the current spec?
Maybe it would be better to open bug reports and see how the developers answer?
Here is the current paragraph:
If a message is received with both a Transfer-Encoding and a Content-Length header field, the Transfer-Encoding overrides the Content-Length. Such a message might indicate an attempt to perform request smuggling (Section 9.5) or response splitting (Section 9.4) and ought to be handled as an error. A sender MUST remove the received Content-Length field prior to forwarding such a message downstream.
When we read the current spec, we interpreted it as being optional to reject messages with both headers. Partly because the spec says that one header should be prioritized over the other. This information isn't needed if requests with both headers should be rejected. Partly also because 'ought' is not one of the keywords (SHOULD, MUST etc.) that are defined in the beginning. But we realize when reading the paragraph now that this actually is a requirement.
Considering that it is rather unclear right now that this is a hard requirement, rather than an optional requirement, we believe that adding a stricter keyword would make the situation more clear for implementors.
With the current formulation of the paragraph we think that server authors would respond with: "We chose to accept messages with both headers to be in line with the Robustness Principle".
With the current formulation of the paragraph we think that server authors would respond with: "We chose to accept messages with both headers to be in line with the Robustness Principle".
Or, maybe they didn't read the spec.
That's why I suggested to actually open bugs and see how the authors reply.
Many authors do not read the spec closely, indeed. And if an issue is raised on their projects, they might change the behavior.
But there are also projects which do closely read the spec and have yet chosen to accept requests with both headers. Such as Apache HTTP Server (httpd), Nginx and HAProxy. What we wrote before about how authors will probably ignore the issue and quote the Robustness Principle was really meant about these projects.
Looking again at the paragraph from the spec, the third sentence also suggests that it is ok to receive requests with both headers to begin with:
A sender MUST remove the received Content-Length field prior to forwarding such a message downstream.
So if the correct interpretation of this paragraph is meant to be "do not accept requests with both headers" then it could definitely be made more clear.
Regarding opening issues in projects, we don't have the time to do that, sorry. :( But as you say, it would be interesting to see how the authors respond.
As a side note, our goal with opening this issue was to bring this strange situation regarding Transfer-Encoding in 1.0 requests to your attention. We don't have any idea for how this should be solved in a good way.
I've pinged the list to get some implementer input: https://www.w3.org/mid/09EACA5C-E560-4C76-A596-8F7C3D890766@mnot.net
I just checked, and in haproxy we do pass the transfer-encoding header to the other side even in for 1.0 requests. That's not particularly pretty, I admit, but at least it does resist to this specific case.
What we should probably state is that transfer-encoding ought to simply be rejected in 1.0 requests. It's too dangerous to let such requests come in when they can add a fake body to methods that do not usually take one. I'd possibly even go as far as not accepting a body at all with GET/HEAD in 1.0, because even a content-length with a GET could possibly result in some smuggling attacks on 1.0 servers that do not even look at content-length with GET.
Now another question is what to do for 1.0 responses. Because while requests can be dangerous, a bad acting server could also be used to poison a cache. I'm not totally enthousiast at the idea of blocking T-E on 1.0 responses given that the transition from 1.0 to 1.1 was very slow and we find about all flavors of 1.1 advertised as 1.0.
Maybe a reasonable approach would be to suggest that any 1.0 message containing Transfer-Encoding must ignore both transfer-encoding and content-length, and must work in close mode ? A gateway would then not consume a request body, and would pass all response body until the connection closes. That could be the most conservative approach and possibly the least prone to breaking existing deployments.
I am unable to reproduce the described behaviour with a recent 2.4.x Apache httpd. Do you have a working file that you send against Apache to trigger 2 responses?
According to my analysis, in Apache "Content-Length" overrides any "Transfer-Encoding: chunked". So, in your example, it would never see the second request. Assuming this not being the case, it would still close the connection after answering the first request, as your example does not trigger a keep-alive behaviour. But even if I add that to the request, the connection is still being closed after seeing the mixed CT+TE headers.
Reading this again: are you only talking about Apache httpd proxying such requests against a backend that is vulnerable?
Yes Stefan, that's what Matthias is testing. In fact not exactly against a vulnerable backend but against any backend which could interpret the request differently. Based on your description I guess that if you turn to close mode there shouldn't be any issue though.
Thanks @wtarreau for clarifying. I guess the "smuggled" request could still order 50 refrigerators?
tested a http proxy setup with a slightly modified example:
GET /proxy/files/empty.txt HTTP/1.0
Host: cgi.tests.httpd.apache.org
Content-Length: 50
Connection: Keep-Alive
Transfer-Encoding: chunked
3
xxx
0
GET /smuggled HTTP/1.0
Host: example.com
The request is forwarded as (wireshark):
GET /files/empty.txt HTTP/1.1\r\n
Host: cgi.tests.httpd.apache.org\r\n
X-Forwarded-For: 127.0.0.1\r\n
X-Forwarded-Host: cgi.tests.httpd.apache.org\r\n
X-Forwarded-Server: cgi.tests.httpd.apache.org\r\n
Content-Length: 3\r\n
Connection: Keep-Alive\r\n
\r\n
xxx
and the response to the client delivered as:
HTTP/1.1 200 OK
Date: Tue, 13 Jul 2021 10:09:08 GMT
Server: Apache/2.5.1-dev (Unix) OpenSSL/1.1.1k
Last-Modified: Tue, 13 Jul 2021 09:08:56 GMT
ETag: "0-5c6fd96685a00"
Accept-Ranges: bytes
Content-Length: 0
Content-Type: text/plain
Connection: close
The smuggled request is never inspected or processed.
Looks perfect!
@icing @wtarreau any concerns / thoughts about changing ought to SHOULD
or MUST
?
@mnot I think we might not want to, because that would recommend endpoints close the connection before processing the suspicious HTTP message.
IIRC, some endpoints used to send HTTP "1.0" messages with Transfer-Encoding: chunked. That's why I think why some servers close the connection after processing those HTTP messages rather than closing at front.
Maybe such endpoints do not exist, but we do not know.
I'm not fond of the idea either. I'm certain I've seen implementations send both in good faith, typically after compression. If you remember, originally (before keep-alive), content-length was used as a progress indicator. A long time ago I've seen some compression gateways not touch it yet compress and deliver in chunks (I seem to remember that in fact they were compressing caches not being able to remove C-L from objects delivered out of the cache). Also some of them that only remove a single C-L, so if a response happens to contain two C-L headers, one of them is removed, a T-E is added and this results in such a situation. The current spec addresses this cleanly.
That's why I think the cleanest we could do (and least impacting) would be to consider that if both T-E and C-L are present in 1.0, we ought to proceed like this:
This could be summarized to "messages with both T-E and C-L in 1.0 end after headers for requests and after the chunked body for responses". This allows a gateway to either pass a response as-is and let the next hop dechunk it, or dechunk itself and close right after the end. And it looks very close to what Apache does according to @icing tests.
Maybe @royfielding has some suggestions on this (he often has good ideas when it comes to anything related to bodies).
/me just realizes that all of this forgets to address the initial question :-) I would be fine with proceeding like above (i.e. ignore message-body) when T-E is present in a 1.0 request, regardless of C-L. In any other case there remains the risk that the next hop would see two requests if it doesn't parse T-E. Maybe this could be relaxed by asking that a gateway could pass these but adds "connection: close" on the forwarded request to make sure that the following data may not be treated as a subsequent request.
@wtarreau's proposal from the list:
a request received with both Content-Length and Transfer-Encoding MAY be rejected as invalid, and in any case the connection MUST always be closed after processing that request. Such a request indicates that the sender does not fully respect the standard and might be at risk of conveying a request smuggling attack ;
Sounds good!
If we go way back in the 1.1 design, I am pretty sure that Content-Length was intended to be advisory (for the sake of a progress bar) whenever chunked is sent. That is how Content-Length was originally interpreted, prior to pipelining and persistent connections.
I don't think we ever seriously considered "an implementation that only implements 1.0" as being a valid use case for intermediaries because there were hundreds of improvements in 1.1 for the sake of enabling such intermediaries. I do not agree that a message claiming it is 1.0 cannot implement chunked. It can, but the recipient cannot trust the encoding to be correct. That is how we ended up with Apache httpd interpreting the request (per spec) and then closing the connection (because it cannot be trusted).
Like you I constantly consider that 1.0 can have full support for all 1.1 features because 1.1 was mostly an official statement for all the crazy 1.0 stuff that was met in field. Do you think that we should remove the "MAY reject as invalid" from my proposal above and only keep the "MUST close" part ?
So let me try again based on the Roy's points above:
after receiving and processing a request with both Content-Length and Transfer-Encoding, a server or intermediary MUST NOT try to process any subsequent request over that connection and MUST close the connection. Such a request indicates that the sender does not fully respect the standard and might be at risk of conveying a request smuggling attack ;
after receiving and processing an HTTP/1.0 request containing a Transfer-Encoding header field, a server or intermediary MUST NOT try to process any subsequent request over that connection and MUST close the connection. Its sender might not be aware of the chunked transfer coding and might be at risk of conveying a request smuggling attack.
For the response path, the same problem may happen when an 1.1 server responds using both C-L and T-E to an 1.0 proxy that might cache a partial response and maintain a reusable, but desynchronized connection to the server. However there is nothing the 1.1 client can do to help a 1.0 cache deal with a non-compliant 1.1 server. That's why standards evolve and need to be adopted and deployed at some point :-)
A server always has the choice of rejecting a request, so I have no problem with the MAY reject if that is useful to say. I am okay with these new requirements, but I think it is more important to explain why they are needed to avoid smuggling attacks. I will try to get it into the spec later today.
Thanks. I think the MAY reject is useful to discourage senders from doing this when they have the choice :-)
According to my analysis, in Apache "Content-Length" overrides any "Transfer-Encoding: chunked".
I am pretty sure that @icing meant the other way around — chunked overrides content-length — since that is what his testing showed (and what we wrote in the spec).
Yes, @royfielding, my initial read of the protocol handling was wrong. Always good to also run tests to see what the code really does.
The
Transfer-Encoding
header was first introduced in version 1.1. So one would assume that 1.0 servers should not interpret it. But in RFC 7230 the following paragraph was introduced under section 2.6:This paragraph suggests that 1.0 servers should interpret the
Transfer-Encoding
header. This paragraph also exists in the current draft version of the Semantics spec. Later in RFC 7230 however, under section 3.3.1, we see the following paragraph:This paragraph suggests that it is ok for implementations that only support 1.0 to not implement the
Transfer-Encoding
header.Taken together, these two paragraphs tell us that servers which implement both 1.0 and 1.1 should in their 1.0 implementation interpret
Transfer-Encoding
. While servers which only implement 1.0 should not. Let's call these two different behaviors the 1.1-behavior and the 1.0-behavior respectively.If we have a setup with a reverse proxy (gateway) which has the 1.0-behavior in front of a server with the 1.1-behavior, then HTTP Request Smuggling is possible, but both systems follow the specification. If the following request is sent to the proxy:
Then the proxy will ignore the
Transfer-Encoding
header, but forward it as-is. The proxy will instead interpret theContent-Length
header and see the second GET as the body of the first one. The server will instead interpret and prioritize theTransfer-Encoding
header and will therefore see two different requests.While looking for Request Smuggling in various proxies and servers we have observed three different behaviors in the wild.
Transfer-Encoding
but never forwards it together with theContent-Length
header, making the smuggling attack impossible.If a proxy with the 1.0-behavior is discovered, then request smuggling is possible, but there is no bug, since both parties follow the RFC. Should the spec somehow be changed/clarified to avoid this?
(Authors: Mattias Grenfeldt and Asta Olofsson)