hyperium / http

Rust HTTP types
Apache License 2.0
1.14k stars 284 forks source link

HeaderValue::from_str does not validate that the string is valid ascii #519

Open tmpolaczyk opened 2 years ago

tmpolaczyk commented 2 years ago

The documentation of HeaderValue::from_str says that:

https://github.com/hyperium/http/blob/abe651273f4cf19cf9a247f376e9ece85becc722/src/header/value.rs#L101-L104

So I would expect this function to return an error when I pass it a non-ascii utf8 string which contains byte values in the 128-255 range. However the result is success.

Playground link

let x = http::header::HeaderValue::from_str("ñ");
panic!("{:?}", x);
thread 'main' panicked at 'Ok("\xc3\xb1")', src/main.rs:3:5
seanmonstar commented 2 years ago

Good catch, this does seem like a "bug".

lilyball commented 2 years ago

I disagree, the documentation should be changed instead. Header values are allowed to contain non-ASCII bytes, it's just control characters (with the exception of HTAB) that are disallowed.

Since every ASCII byte in a UTF-8 string is an ASCII char (multi-byte UTF-8 chars cannot contain bytes that look like ASCII), we should just relax this restriction and say that ASCII control characters with the exception of HTAB are disallowed and everything else is valid. This means the implementation stays the same (as that's what it's already doing).

lilyball commented 2 years ago

What's more, I would argue that from_static() should also allow non-ascii chars (and the documentation fixed, right now it says it allows 32-127 but it actually disallows 127 and allows 9), and to_str() should also stop limiting itself like this. I'm going to file a separate issue on the latter, but from_static() seems reasonable to cover here as "make it behave like from_str does".

lilyball commented 2 years ago

Filed as #527. So to summarize:

At this point the only use left of is_viisble_ascii is in the Debug impl. That's not a big issue as this is just debug, but it would be really nice if that could also detect valid UTF-8–encoded characters and allow those too. Ideally it would work like std::str::escape_debug() except including hex escapes for non-utf-8 bytes. But that's orthogonal to this issue.

xoac commented 2 years ago

There is from_bytes() function that allow you use non-ascii bytes. And the name of from_str is misleading it should be try_from_str actually (but it's documented to be replaced in the future with TryFrom trait).

lilyball commented 2 years ago

@xoac There's no reason why HeaderValue::from_bytes(s.as_bytes()) should accept something that HeaderValue::from_str(s) does not (or rather, no reason why from_str() should reject something that from_bytes() allows). Also the documentation on from_bytes() once again forgets that \t is a valid byte.

Also I don't know why from_str() continues to say it will be replaced by TryFrom once the trait is stabilized, as the trait was stabilized back in 1.34 and HeaderValue already implements it. Heck, the TryFrom impl just delegates to FromStr (which has been stable since 1.0), so that just makes it even more confusing as to why HeaderValue::from_str() exists and claims it will be replaced. Also, these traits can't replace HeaderValue::from_static().