tornadoweb / tornado

Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.
http://www.tornadoweb.org/
Apache License 2.0
21.61k stars 5.49k forks source link

Incorrect header validation #3310

Open kenballus opened 10 months ago

kenballus commented 10 months ago

Header names

RFC 9110 says that HTTP header names are permitted to contain only the following characters:

"!" / "#" / "$" / "%" / "&" / "'" / "*"
/ "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
/ DIGIT / ALPHA

Tornado allows the following invalid bytes inside of header names:

['\x00', '\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\x07', '\x08', '\x09', '\x0b', '\x0c', '\x0d', '\x0e', '\x0f', '\x10', '\x11', '\x12', '\x13', '\x14', '\x15', '\x16', '\x17', '\x18', '\x19', '\x1a', '\x1b', '\x1c', '\x1d', '\x1e', '\x1f', '\x20', '\x22', '\x28', '\x29', '\x2c', '\x2f', '\x3b', '\x3c', '\x3d', '\x3e', '\x3f', '\x40', '\x5b', '\x5c', '\x5d', '\x7b', '\x7d', '\x7f', '\x80', '\x81', '\x82', '\x83', '\x84', '\x85', '\x86', '\x87', '\x88', '\x89', '\x8a', '\x8b', '\x8c', '\x8d', '\x8e', '\x8f', '\x90', '\x91', '\x92', '\x93', '\x94', '\x95', '\x96', '\x97', '\x98', '\x99', '\x9a', '\x9b', '\x9c', '\x9d', '\x9e', '\x9f', '\xa0', '\xa1', '\xa2', '\xa3', '\xa4', '\xa5', '\xa6', '\xa7', '\xa8', '\xa9', '\xaa', '\xab', '\xac', '\xad', '\xae', '\xaf', '\xb0', '\xb1', '\xb2', '\xb3', '\xb4', '\xb5', '\xb6', '\xb7', '\xb8', '\xb9', '\xba', '\xbb', '\xbc', '\xbd', '\xbe', '\xbf', '\xc0', '\xc1', '\xc2', '\xc3', '\xc4', '\xc5', '\xc6', '\xc7', '\xc8', '\xc9', '\xca', '\xcb', '\xcc', '\xcd', '\xce', '\xcf', '\xd0', '\xd1', '\xd2', '\xd3', '\xd4', '\xd5', '\xd6', '\xd7', '\xd8', '\xd9', '\xda', '\xdb', '\xdc', '\xdd', '\xde', '\xdf', '\xe0', '\xe1', '\xe2', '\xe3', '\xe4', '\xe5', '\xe6', '\xe7', '\xe8', '\xe9', '\xea', '\xeb', '\xec', '\xed', '\xee', '\xef', '\xf0', '\xf1', '\xf2', '\xf3', '\xf4', '\xf5', '\xf6', '\xf7', '\xf8', '\xf9', '\xfa', '\xfb', '\xfc', '\xfd', '\xfe', '\xff']

This is nearly all of the invalid bytes (only '\n' and ':' are handled correctly).

Of particular note is that '\x00', '\r', '\t', and ' ' are allowed through.

I suggest that we start rejecting headers wtih names that contain any of these characters, as most web servers do.

Header values

RFC 9110 says this:

Field values containing CR, LF, or NUL characters are invalid and dangerous, due to the varying ways that implementations might parse and interpret those characters; a recipient of CR, LF, or NUL within a field value MUST either reject the message or replace each of those characters with SP before further processing or forwarding of that message.

Tornado allows null bytes in header values.

I suggest that we start rejecting headers with values that contain null, as most web servers do.

bdarnell commented 10 months ago

Tornado is definitely a product of the View Source era of web development and Postel's law - there are probably a lot of places like this where we're more permissive than the standards. I'm inclined to be cautious about becoming less permissive (unless there's a concrete risk as with underscores in Content-Length), though, since there's always the possibility that someone is relying on the old behavior.

The way the set of allowed header characters was documented changed between RFC 2616 and 7230. 2616 said "anything but control and separators", while 7230 listed the allowed characters. I don't think that's a semantic change (if so it wasn't noted in the "differences from 2616" section), but the new form is much clearer. In particular, it was less obvious that only ascii was allowed (I was under the impression that ISO-8859-1 was used), leading to confusion in https://github.com/tornadoweb/tornado/issues/2043. 7230 does document some changes in this area, in particular forbidding leading/trailing whitespace.

I agree with your suggestions; nuls and whitespace seem like the most problematic characters here and the least likely to cause backwards-compatibility problems. I'm also inclined to forbid all non-ascii characters. But I'd probably queue these changes up for Tornado 7.0 instead of putting them in a minor release.

kenballus commented 10 months ago

Tornado is definitely a product of the View Source era of web development and Postel's law - there are probably a lot of places like this where we're more permissive than the standards. I'm inclined to be cautious about becoming less permissive (unless there's a concrete risk as with underscores in Content-Length), though, since there's always the possibility that someone is relying on the old behavior.

This is understandable. I think most users expect a pretty strict interpretation of the standards these days, so it might be worth keeping a list of those places in which Tornado is more permissive than the standards recommend.

I agree with your suggestions; nuls and whitespace seem like the most problematic characters here and the least likely to cause backwards-compatibility problems. I'm also inclined to forbid all non-ascii characters. But I'd probably queue these changes up for Tornado 7.0 instead of putting them in a minor release.

I think it makes sense to delay changing behavior on non-ascii headers until 7.0, but nul could potentially be handled in a minor release, given that nul bytes aren't in any non-null UTF-8 characters.

kenballus commented 8 months ago

Another thing: the RFC specifies that header names must be non-empty, but Tornado doesn't enforce this rule. Some proxies have had issues with empty header names in the past (CVE-2023-25725)