Clarification on messages with invalid Content-Length and valid Transfer-Encoding

When a gateway server receives a message with an invalid Content-Length field value and a valid Transfer-Encoding field value, what should it do?

My interpretation of the standard is that such messages must not be forwarded. From RFC 9110, section 8.6:

Likewise, a sender MUST NOT forward a message with a Content-Length header field value that does not match the ABNF above, with one exception: a recipient of a Content-Length header field value consisting of the same decimal value repeated as a comma-separated list (e.g, "Content-Length: 42, 42") MAY either reject the message as invalid or replace that invalid field value with a single instance of the decimal value, since this likely indicates that a duplicate was generated or combined by an upstream message processor.

However, others have argued that the standard also allows intermediaries to forward such messages, so long as the invalid Content-Length header is removed. See this Squid issue thread, this Go net/http issue thread, and this H2O issue thread to see those arguments. These arguments usually appeal to RFC 9112, section 6.3:

If a message is received with both a Transfer-Encoding and a Content-Length header field, the Transfer-Encoding overrides the Content-Length. Such a message might indicate an attempt to perform request smuggling (Section 11.2) or response splitting (Section 11.1) and ought to be handled as an error. An intermediary that chooses to forward the message MUST first remove the received Content-Length field and process the Transfer-Encoding (as described below) prior to forwarding the message downstream.

EDIT: This question has been answered clearly and effectively by @mnot. If you are going to respond to this thread, please do not respond to the content of this message alone.

See RFC 9112, Section 6.3:

If a message is received with both a Transfer-Encoding and a Content-Length header field, the Transfer-Encoding overrides the Content-Length. Such a message might indicate an attempt to perform request smuggling (Section 11.2) or response splitting (Section 11.1) and ought to be handled as an error. An intermediary that chooses to forward the message MUST first remove the received Content-Length field and process the Transfer-Encoding (as described below) prior to forwarding the message downstream.

I can see how the requirement in 9110 can be read otherwise, but the most literal reading (it can't forward the message with the incorrect c-l, but it can forward it if it strips it) is the correct one (as supported by the text above).

I can see how the requirement in 9110 can be read otherwise, but the most literal reading (it can't forward the message with the incorrect c-l, but it can forward it if it strips it) is the correct one (as supported by the text above).

Thank you for the clarification.

My original interpretation stemmed from the use of the word "exception" in the RFC 9110 text:

Likewise, a sender MUST NOT forward a message with a Content-Length header field value that does not match the ABNF above, with one exception: a recipient of a Content-Length header field value consisting of the same decimal value repeated as a comma-separated list (e.g, "Content-Length: 42, 42") MAY either reject the message as invalid or replace that invalid field value with a single instance of the decimal value, since this likely indicates that a duplicate was generated or combined by an upstream message processor.

This sentence has two components, which I'll call clauses $A$ and $B$.

Clause $A$:

Likewise, a sender MUST NOT forward a message with a Content-Length header field value that does not match the ABNF above.

The correct interpretation of clause $A$ (as you have confirmed) is that if a sender receives a message with an invalid Content-Length header, then that header (with its invalid value) must not be sent in the forwarded message.

Clause $B$:

with one exception: a recipient of a Content-Length header field value consisting of the same decimal value repeated as a comma-separated list (e.g, "Content-Length: 42, 42") MAY either reject the message as invalid or replace that invalid field value with a single instance of the decimal value, since this likely indicates that a duplicate was generated or combined by an upstream message processor.

I think everyone agrees about what this is saying: a recipient of a message with a Content-Length header field value consisting of the same decimal value repeated in a comma-separated list may do either of the following:

De-duplicate the list and forward the message.
Reject the message.

The issue is that neither of these options constitutes an exception to clause $A$, because neither of them requires forwarding an invalid Content-Length header field value.

Do you think this point deserves further clarification in the text, given that I'm not the only person to have made this misinterpretation?

Thanks.

Do you think this point deserves further clarification in the text, given that I'm not the only person to have made this misinterpretation?

I don't think so (personal opinion). I do think you might be incorrectly extrapolating your "misinterpretation" from implementation details in the wild -- whose authors may or may not have carefully read these parts of the current RFC specifications, or may have read earlier RFC specifications.

My observation is that the HTTP RFCs attempt to specify correct and interoperable behavior, but without overspecifying implementation details.

A sender MUST NOT send a malformed message. That is a clear and direct specification with no exceptions.

If a malformed request is received, the receiver may reject the request, or may forward or service the request after removing malformed portions (if modifying the request is deemed safe and appropriate). The receiver may also optionally log the bad request or keep a count of bad requests that informs the receiver whether or not to continue receiving requests from that client. The RFCs do not attempt to specify all possible permissible behaviors. Please note: a server may reject or refuse to serve a request for any reason it pleases, and the RFCs again make no attempt to enumerate all the potential reasons.

With decades past and wide usage of HTTP, there are lessons to be learned. The updated HTTP RFCs have more strictly specified some behaviors where there may be security concerns. This includes guidance to detect and reject requests which might suggest attempts at request smuggling or request splitting.

Implementations written prior to RFC 9110 et al may have been compliant with earlier versions of the HTTP specifications, and might now not be "strictly compliant" with the guidance in the updated RFCs. Not all items in the updated RFCs have equal security implications. While I feel that this issue here amounts to splitting words, others areas of your research have uncovered much more important bugs in implementations.

When a gateway server receives a message with an invalid Content-Length field value and a valid Transfer-Encoding field value, what should it do?

This falls into the category of implementation details. The RFCs specify what it MUST NOT do, which is that a gateway server MUST NOT forward a malformed request.

Some implementations might read all headers in an HTTP/1.1 request before processing those headers. Others might process the headers as each line is read.

In a malformed HTTP/1.1 request sent with both Content-Length and Transfer-Encoding, Content-Length might come first or Transfer-Encoding might come first. The RFCs do not specify that every header line must be strictly validated. An implementation which sees Transfer-Encoding and then sees Content-Length might ignore Content-Length and not bother parsing Content-Length for strict RFC compliance. Unless you can point to text in the RFC which explicitly states that a receiver must unconditionally parse and reject invalid Content-Length, then that is an implementation detail.

You quoted RFC 9110, section 8.6 in the original post. To an implementer, that clearly states that forwarding a message with an invalid Content-Length is forbidden, but if the invalid Content-Length is removed or corrected, then such a message would no longer be a message with an invalid Content-Length.

You quoted RFC 9112, section 6.3 in the original post. To an implementer, that part of the RFC discusses some implementation details to highlight the security concern, and again, specifies that the sender must not send a malformed message. The text says that due to security concerns, the malformed message "ought to be handled as an error". That sentence does not use the well-defined "MUST" in all-capitals, and so is not a strictly specified requirement of the specification.

I do think you might be incorrectly extrapolating your "misinterpretation" from implementation details in the wild -- whose authors may or may not have carefully read these parts of the current RFC specifications, or may have read earlier RFC specifications.

No. I was referring to Alex Rousskov (Squid) and Andy Pan (Go net/http), each of whom, in response to my asking about the quoted section of RFC 9110, believed that accepting invalid Content-Length field values, even in the context of Transfer-Encoding, was a violation of the RFCs.

Regarding the rest of your message, we are in complete agreement. I accept that the handling of messages with invalid Content-Length and valid Transfer-Encoding is implementation-defined.

The more important issue, which still remains to be settled, is the content of my previous message, which I will now restate.

RFC 9110 says that messages containing invalid Content-Length header field values MUST NOT be forwarded with the invalid header intact:

Likewise, a sender MUST NOT forward a message with a Content-Length header field value that does not match the ABNF above, ...
It also says that there's an exception to this rule:

... with one exception: a recipient of a Content-Length header field value consisting of the same decimal value repeated as a comma-separated list (e.g, "Content-Length: 42, 42") MAY either reject the message as invalid or replace that invalid field value with a single instance of the decimal value, since this likely indicates that a duplicate was generated or combined by an upstream message processor.

An exception to a rule is a situation in which the rule does not apply. Because the above text (2) does not allow for invalid Content-Length headers to be forwarded, it therefore seems to me that it is not an exception to the rule (1).

So, to reiterate, I am asking only this: In what sense is (2) an exception to (1)?

Thank you.

The whole paragraph is irrelevant in the presence of a Transfer-Encoding header. The clause @mnot quoted is the only one in the Content-Length section that applies to this case. The recipient may either; reject for any reason it wants to claim, or (intermediaries only) forward with removal of the Content-Length prior to interpreting the message using Transfer-Encoding.

The whole paragraph is irrelevant in the presence of a Transfer-Encoding header. The clause @mnot quoted is the only one in the Content-Length section that applies to this case. The recipient may either; reject for any reason it wants to claim, or (intermediaries only) forward with removal of the Content-Length prior to interpreting the message using Transfer-Encoding.

Yes; as I have now repeated numerous times, I agree with everything you just wrote, and I consider that issue settled.

I am now asking specifically about the usage of the word "exception" in a particular sentence from RFC 9110. This question has nothing to do with Transfer-Encoding; it is asking about the usage of a word. I have changed the issue title accordingly, and have stated the question clearly in my previous two comments.

@kenballus it seems to me that your logic is faulty and that you are incorrectly extrapolating assuming that the RFC statements you are quoting are intended to define 100% of implementation behavior. We have already stated that assumption is incorrect.

It has been recommended to you that you read the statements more literally, and to not assume that individual statements define 100% of the expected implementation behavior.

Likewise, a sender MUST NOT forward a message with a Content-Length header field value that does not match the ABNF above, with one exception: a recipient of a Content-Length header field value consisting of the same decimal value repeated as a comma-separated list (e.g, "Content-Length: 42, 42") MAY either reject the message as invalid or replace that invalid field value with a single instance of the decimal value, since this likely indicates that a duplicate was generated or combined by an upstream message processor.

When read more literally, this says only that a sender MUST NOT forward a message received that contains invalid Content-Length -- and then forwarded including the received invalid Content-Length. If the received Content-Length repeats the same length, then the sender may fix it to list the length exactly once, as required by the ABNF, and then the sender may forward the request with the fixed Content-Length.

As you quoted in your original post, RFC 9112, section 6.3, if the request is a valid request without Content-Length, such as when Transfer-Encoding: chunked is present, the received Content-Length MUST be stripped from the forwarded request if the receiver chooses to forward the request. There is no explicit requirement to validate the not-forwarded Content-Length, though as a security precaution, RFC 9112, section 6.3 notes that receiving both Content-Length and Transfer-Encoding is not expected in valid requests and may indicate an attempt at request smuggling or response splitting, and so "ought to be handled as an error."

When you remove the faulty assumption that RFCs specify 100% of implementation-defined behavior, do those RFC phrases you quoted in the original post make more sense, as well as complement one another rather than conflict? They provide constraints on the situations that each describes.

@gstrauss,

I have no such "faulty assumption." You are addressing an issue that was already settled by @mnot in the first reply to this thread. I agree with you, him, and @yadij about that issue, and readily accept that the behavior is implementation-defined.

I have since asked a follow-up question, but you and others keep responding to my first question only. This thread is going in circles, so I'm going to close it and open a new issue that carries less baggage.

httpwg / http-core

Clarification on messages with invalid Content-Length and valid Transfer-Encoding #1113