httpwg / http-core

Core HTTP Specifications
https://httpwg.org/http-core/
470 stars 43 forks source link

Should servers interpret Transfer-Encoding in 1.0 requests? #879

Closed mattiasgrenfeldt closed 3 years ago

mattiasgrenfeldt commented 3 years ago

The Transfer-Encoding header was first introduced in version 1.1. So one would assume that 1.0 servers should not interpret it. But in RFC 7230 the following paragraph was introduced under section 2.6:

The interpretation of a header field does not change between minor versions of the same major HTTP version, though the default behavior of a recipient in the absence of such a field can change. Unless specified otherwise, header fields defined in HTTP/1.1 are defined for all versions of HTTP/1.x. In particular, the Host and Connection header fields ought to be implemented by all HTTP/1.x implementations whether or not they advertise conformance with HTTP/1.1.

This paragraph suggests that 1.0 servers should interpret the Transfer-Encoding header. This paragraph also exists in the current draft version of the Semantics spec. Later in RFC 7230 however, under section 3.3.1, we see the following paragraph:

Transfer-Encoding was added in HTTP/1.1. It is generally assumed that implementations advertising only HTTP/1.0 support will not understand how to process a transfer-encoded payload. A client MUST NOT send a request containing Transfer-Encoding unless it knows the server will handle HTTP/1.1 (or later) requests; such knowledge might be in the form of specific user configuration or by remembering the version of a prior received response. A server MUST NOT send a response containing Transfer-Encoding unless the corresponding request indicates HTTP/1.1 (or later).

This paragraph suggests that it is ok for implementations that only support 1.0 to not implement the Transfer-Encoding header.

Taken together, these two paragraphs tell us that servers which implement both 1.0 and 1.1 should in their 1.0 implementation interpret Transfer-Encoding. While servers which only implement 1.0 should not. Let's call these two different behaviors the 1.1-behavior and the 1.0-behavior respectively.

If we have a setup with a reverse proxy (gateway) which has the 1.0-behavior in front of a server with the 1.1-behavior, then HTTP Request Smuggling is possible, but both systems follow the specification. If the following request is sent to the proxy:

GET / HTTP/1.0
Host: example.com
Content-Length: 50
Transfer-Encoding: chunked

0

GET /smuggled HTTP/1.0
Host: example.com

Then the proxy will ignore the Transfer-Encoding header, but forward it as-is. The proxy will instead interpret the Content-Length header and see the second GET as the body of the first one. The server will instead interpret and prioritize the Transfer-Encoding header and will therefore see two different requests.

While looking for Request Smuggling in various proxies and servers we have observed three different behaviors in the wild.

If a proxy with the 1.0-behavior is discovered, then request smuggling is possible, but there is no bug, since both parties follow the RFC. Should the spec somehow be changed/clarified to avoid this?

(Authors: Mattias Grenfeldt and Asta Olofsson)

reschke commented 3 years ago

Interesting issue.

AFAIU, we really can't make any new requirements on servers that only support 1.0.

mnot commented 3 years ago

Julian is correct; in this specification we can't retrofit new requirements onto HTTP/1.0 intermediaries.

However, note that the RFC-to-be says:

  1. If a message is received with both a Transfer-Encoding and a Content-Length header field, the Transfer-Encoding overrides the Content-Length. Such a message might indicate an attempt to perform request smuggling (Section 11.2) or response splitting (Section 11.1) and ought to be handled as an error.

... so the server in your scenario is already encouraged to handle that as an attack.

Is that sufficient? Arguably we could increase that 'ought' to a SHOULD or even a MUST.

reschke commented 3 years ago

@mattiasgrenfeldt ?

asta12 commented 3 years ago

Sorry for the late response.

In the 6 proxies and 6 servers we have looked for Request Smuggling in, only one server rejects requests with both Content-Length and Transfer-Enconding. So even though the current spec warns of an attack, that advise is generally not followed.

Changing the spec to reject requests with both headers would be very good for stopping Request Smuggling issues.

But we can only speak from the Request Smuggling point of view. There might be other reasons to not make that change.

(Asta Olofsson & Mattias Grenfeldt)

reschke commented 3 years ago

So the spec already warns about that case. Do you really believe if another change in the spec would affect those implementations that do not follow the current spec?

Maybe it would be better to open bug reports and see how the developers answer?

mattiasgrenfeldt commented 3 years ago

Here is the current paragraph:

  If a message is received with both a Transfer-Encoding and a
  Content-Length header field, the Transfer-Encoding overrides the
  Content-Length.  Such a message might indicate an attempt to
  perform request smuggling (Section 9.5) or response splitting
  (Section 9.4) and ought to be handled as an error.  A sender MUST
  remove the received Content-Length field prior to forwarding such
  a message downstream.

Source

When we read the current spec, we interpreted it as being optional to reject messages with both headers. Partly because the spec says that one header should be prioritized over the other. This information isn't needed if requests with both headers should be rejected. Partly also because 'ought' is not one of the keywords (SHOULD, MUST etc.) that are defined in the beginning. But we realize when reading the paragraph now that this actually is a requirement.

Considering that it is rather unclear right now that this is a hard requirement, rather than an optional requirement, we believe that adding a stricter keyword would make the situation more clear for implementors.

With the current formulation of the paragraph we think that server authors would respond with: "We chose to accept messages with both headers to be in line with the Robustness Principle".

reschke commented 3 years ago

With the current formulation of the paragraph we think that server authors would respond with: "We chose to accept messages with both headers to be in line with the Robustness Principle".

Or, maybe they didn't read the spec.

That's why I suggested to actually open bugs and see how the authors reply.

mattiasgrenfeldt commented 3 years ago

Many authors do not read the spec closely, indeed. And if an issue is raised on their projects, they might change the behavior.

But there are also projects which do closely read the spec and have yet chosen to accept requests with both headers. Such as Apache HTTP Server (httpd), Nginx and HAProxy. What we wrote before about how authors will probably ignore the issue and quote the Robustness Principle was really meant about these projects.

Looking again at the paragraph from the spec, the third sentence also suggests that it is ok to receive requests with both headers to begin with:

A sender MUST remove the received Content-Length field prior to forwarding such a message downstream.

So if the correct interpretation of this paragraph is meant to be "do not accept requests with both headers" then it could definitely be made more clear.

Regarding opening issues in projects, we don't have the time to do that, sorry. :( But as you say, it would be interesting to see how the authors respond.


As a side note, our goal with opening this issue was to bring this strange situation regarding Transfer-Encoding in 1.0 requests to your attention. We don't have any idea for how this should be solved in a good way.

mnot commented 3 years ago

I've pinged the list to get some implementer input: https://www.w3.org/mid/09EACA5C-E560-4C76-A596-8F7C3D890766@mnot.net

wtarreau commented 3 years ago

I just checked, and in haproxy we do pass the transfer-encoding header to the other side even in for 1.0 requests. That's not particularly pretty, I admit, but at least it does resist to this specific case.

What we should probably state is that transfer-encoding ought to simply be rejected in 1.0 requests. It's too dangerous to let such requests come in when they can add a fake body to methods that do not usually take one. I'd possibly even go as far as not accepting a body at all with GET/HEAD in 1.0, because even a content-length with a GET could possibly result in some smuggling attacks on 1.0 servers that do not even look at content-length with GET.

Now another question is what to do for 1.0 responses. Because while requests can be dangerous, a bad acting server could also be used to poison a cache. I'm not totally enthousiast at the idea of blocking T-E on 1.0 responses given that the transition from 1.0 to 1.1 was very slow and we find about all flavors of 1.1 advertised as 1.0.

Maybe a reasonable approach would be to suggest that any 1.0 message containing Transfer-Encoding must ignore both transfer-encoding and content-length, and must work in close mode ? A gateway would then not consume a request body, and would pass all response body until the connection closes. That could be the most conservative approach and possibly the least prone to breaking existing deployments.

icing commented 3 years ago

I am unable to reproduce the described behaviour with a recent 2.4.x Apache httpd. Do you have a working file that you send against Apache to trigger 2 responses?

According to my analysis, in Apache "Content-Length" overrides any "Transfer-Encoding: chunked". So, in your example, it would never see the second request. Assuming this not being the case, it would still close the connection after answering the first request, as your example does not trigger a keep-alive behaviour. But even if I add that to the request, the connection is still being closed after seeing the mixed CT+TE headers.

Reading this again: are you only talking about Apache httpd proxying such requests against a backend that is vulnerable?

wtarreau commented 3 years ago

Yes Stefan, that's what Matthias is testing. In fact not exactly against a vulnerable backend but against any backend which could interpret the request differently. Based on your description I guess that if you turn to close mode there shouldn't be any issue though.

icing commented 3 years ago

Thanks @wtarreau for clarifying. I guess the "smuggled" request could still order 50 refrigerators?

icing commented 3 years ago

tested a http proxy setup with a slightly modified example:

GET /proxy/files/empty.txt HTTP/1.0
Host: cgi.tests.httpd.apache.org
Content-Length: 50
Connection: Keep-Alive
Transfer-Encoding: chunked

3
xxx
0

GET /smuggled HTTP/1.0
Host: example.com

The request is forwarded as (wireshark):

    GET /files/empty.txt HTTP/1.1\r\n
    Host: cgi.tests.httpd.apache.org\r\n
    X-Forwarded-For: 127.0.0.1\r\n
    X-Forwarded-Host: cgi.tests.httpd.apache.org\r\n
    X-Forwarded-Server: cgi.tests.httpd.apache.org\r\n
    Content-Length: 3\r\n
    Connection: Keep-Alive\r\n
    \r\n
    xxx

and the response to the client delivered as:

HTTP/1.1 200 OK
Date: Tue, 13 Jul 2021 10:09:08 GMT
Server: Apache/2.5.1-dev (Unix) OpenSSL/1.1.1k
Last-Modified: Tue, 13 Jul 2021 09:08:56 GMT
ETag: "0-5c6fd96685a00"
Accept-Ranges: bytes
Content-Length: 0
Content-Type: text/plain
Connection: close

The smuggled request is never inspected or processed.

wtarreau commented 3 years ago

Looks perfect!

mnot commented 3 years ago

@icing @wtarreau any concerns / thoughts about changing ought to SHOULD or MUST?

kazuho commented 3 years ago

@mnot I think we might not want to, because that would recommend endpoints close the connection before processing the suspicious HTTP message.

IIRC, some endpoints used to send HTTP "1.0" messages with Transfer-Encoding: chunked. That's why I think why some servers close the connection after processing those HTTP messages rather than closing at front.

Maybe such endpoints do not exist, but we do not know.

wtarreau commented 3 years ago

I'm not fond of the idea either. I'm certain I've seen implementations send both in good faith, typically after compression. If you remember, originally (before keep-alive), content-length was used as a progress indicator. A long time ago I've seen some compression gateways not touch it yet compress and deliver in chunks (I seem to remember that in fact they were compressing caches not being able to remove C-L from objects delivered out of the cache). Also some of them that only remove a single C-L, so if a response happens to contain two C-L headers, one of them is removed, a T-E is added and this results in such a situation. The current spec addresses this cleanly.

That's why I think the cleanest we could do (and least impacting) would be to consider that if both T-E and C-L are present in 1.0, we ought to proceed like this:

This could be summarized to "messages with both T-E and C-L in 1.0 end after headers for requests and after the chunked body for responses". This allows a gateway to either pass a response as-is and let the next hop dechunk it, or dechunk itself and close right after the end. And it looks very close to what Apache does according to @icing tests.

Maybe @royfielding has some suggestions on this (he often has good ideas when it comes to anything related to bodies).

wtarreau commented 3 years ago

/me just realizes that all of this forgets to address the initial question :-) I would be fine with proceeding like above (i.e. ignore message-body) when T-E is present in a 1.0 request, regardless of C-L. In any other case there remains the risk that the next hop would see two requests if it doesn't parse T-E. Maybe this could be relaxed by asking that a gateway could pass these but adds "connection: close" on the forwarded request to make sure that the following data may not be treated as a subsequent request.

mnot commented 3 years ago

@wtarreau's proposal from the list:

mattiasgrenfeldt commented 3 years ago

Sounds good!

royfielding commented 3 years ago

If we go way back in the 1.1 design, I am pretty sure that Content-Length was intended to be advisory (for the sake of a progress bar) whenever chunked is sent. That is how Content-Length was originally interpreted, prior to pipelining and persistent connections.

I don't think we ever seriously considered "an implementation that only implements 1.0" as being a valid use case for intermediaries because there were hundreds of improvements in 1.1 for the sake of enabling such intermediaries. I do not agree that a message claiming it is 1.0 cannot implement chunked. It can, but the recipient cannot trust the encoding to be correct. That is how we ended up with Apache httpd interpreting the request (per spec) and then closing the connection (because it cannot be trusted).

wtarreau commented 3 years ago

Like you I constantly consider that 1.0 can have full support for all 1.1 features because 1.1 was mostly an official statement for all the crazy 1.0 stuff that was met in field. Do you think that we should remove the "MAY reject as invalid" from my proposal above and only keep the "MUST close" part ?

wtarreau commented 3 years ago

So let me try again based on the Roy's points above:

For the response path, the same problem may happen when an 1.1 server responds using both C-L and T-E to an 1.0 proxy that might cache a partial response and maintain a reusable, but desynchronized connection to the server. However there is nothing the 1.1 client can do to help a 1.0 cache deal with a non-compliant 1.1 server. That's why standards evolve and need to be adopted and deployed at some point :-)

royfielding commented 3 years ago

A server always has the choice of rejecting a request, so I have no problem with the MAY reject if that is useful to say. I am okay with these new requirements, but I think it is more important to explain why they are needed to avoid smuggling attacks. I will try to get it into the spec later today.

wtarreau commented 3 years ago

Thanks. I think the MAY reject is useful to discourage senders from doing this when they have the choice :-)

royfielding commented 3 years ago

According to my analysis, in Apache "Content-Length" overrides any "Transfer-Encoding: chunked".

I am pretty sure that @icing meant the other way around — chunked overrides content-length — since that is what his testing showed (and what we wrote in the spec).

icing commented 3 years ago

Yes, @royfielding, my initial read of the protocol handling was wrong. Always good to also run tests to see what the code really does.