seanmonstar / httparse

A push parser for the HTTP 1.x protocol in Rust.
https://docs.rs/httparse
Apache License 2.0
569 stars 110 forks source link

double quotes in headers can not be parsed #83

Open clouduol opened 3 years ago

clouduol commented 3 years ago

In the header name map, index 34 is false. So when parsing headers, the program will return Error::HeaderName when meeting double quotes(ascii number is 34) . However, bouble quotes in headers can be parsed correctly in chrome. Is this intentional or a bug?

Test Example:

 #[test]
    fn test_double_quotes() {
        use std::mem;
        let bytes= b"HTTP/1.1 200 OK\r\nServer: nginx/1.14.2\r\nDate: Mon, 25 Jan 2021 06:20:06 GMT\r\nContent-Type: image/png\r\nContent-Length: 24623\r\nConnection: keep-alive\r\n\"Access-Control-Allow-Origin: *\"\r\nAccept-Ranges: bytes\r\nAccess-Control-Allow-Origin: *\r\nCache-Control: 2592000\r\n\r\n";
        let mut headers: [Header; 10] = unsafe { mem::uninitialized() };
        let mut res = Response::new(&mut headers);
        let parsed_res = res.parse(bytes);
        println!("parsed res= {:?}", parsed_res);
    }
tari commented 2 years ago

At least according to RFC 2616, Chrome's behavior seems correct. A header name is a sequence of token, and token is any character excluding CTLs or SEPARATORs, neither of which includes ". However RFC 7230 (which obsoletes 2616) defines a token as a sequence of tchar, a class which specifically excludes double quotes.

I'd think a parser should be able to handle the old definition because it's impossible to tell whether a given message conforms to RFC 2616 or 7230 (both describe HTTP/1.1), but practically the new definition seems to have been changed because parsing the old one is much more complex than it initially seems and many implementations differ in their interpretation.


As a related example, RFC 7230 (section 3.2.4) deprecates line folding for header values and allows servers receiving folded lines to reject them, but also requires user agents to accept folded lines by converting to sequences of spaces before interpreting. In that particular instance, it seems like this library should implement the behavior specified for user agents because that is also permitted for servers.

tari commented 2 years ago

It seems like the rationale in #68 applies here as well though: supporting the old behavior is difficult and probably slow, so it's simply not supported.

nox commented 2 years ago

https://github.com/seanmonstar/httparse/pull/114 will let you ignore the invalid header, which is what Chrome is doing.