http-rs / surf

Fast and friendly HTTP client framework for async Rust
https://docs.rs/surf
Apache License 2.0
1.45k stars 119 forks source link

Panic on some websites with async-h1 backend: "String slice should be valid ASCII" #284

Open Shnatsel opened 3 years ago

Shnatsel commented 3 years ago

On some websites, e.g. http://futureuae.com, surf panics fails with the following error:

thread 'main' panicked at 'called Result::unwrap() on an Err value: String slice should be valid ASCII

Firefox and curl work fine.

463 websites out of the top million from Feb 3 Tranco list are affected.

Tested using this code. Test tool output from all affected websites: surf-should-be-valid-ascii.tar.gz

The root cause might be in some dependency and not in surf itself. But the backtrace is useless due to async, and the error message doesn't even point to the line where the panic happens, so I'm reporting it here.

I've only tested the async-h1 backend; I don't know if the other bakends are affected.

Fishrock123 commented 3 years ago

Here's a backtrace:

RUST_BACKTRACE=1 longboard GET https://futureuae.com
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: String slice should be valid ASCII', /home/jeremiah/.cargo/registry/src/github.com-1ecc6299db9ec823/http-types-2.10.0/src/headers/headers.rs:54:62
stack backtrace:
   0: rust_begin_unwind
             at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/panicking.rs:493:5
   1: core::panicking::panic_fmt
             at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/core/src/panicking.rs:92:14
   2: core::option::expect_none_failed
             at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/core/src/option.rs:1268:5
   3: http_types::headers::headers::Headers::insert
   4: http_types::headers::headers::Headers::append
   5: <http_client::h1::H1Client as http_client::HttpClient>::send::__send::{{closure}}
   6: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   7: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   8: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  10: std::thread::local::LocalKey<T>::with
  11: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  12: async_io::driver::block_on
  13: tokio::runtime::context::enter
  14: tokio::runtime::handle::Handle::enter
  15: std::thread::local::LocalKey<T>::with
  16: std::thread::local::LocalKey<T>::with
  17: async_std::task::builder::Builder::blocking
  18: longboard::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

As I suspected it seems this website is sending a header with invalid ASCII. http-types does not allow non-ASCII in headers as the HTTP spec also recommends not allowing invalid non-ASCII as of RFC 7230: https://tools.ietf.org/html/rfc7230#section-3.2.4

Quoting stackoverflow

In short: Only ASCII is guaranteed to work. Some non-ASCII bytes are allowed for backwards compatibility, but are not supposed to be displayable.

HTTPbis gave up and specified that in the headers there is no useful encoding besides ASCII:

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.

The website needs to fix this and be spec compliant. We are unlikely to support this.

Fishrock123 commented 3 years ago

I suppose this should not panic, but it should be an error.

mkroening commented 3 years ago

I ran into this issue myself today.

Patching http-types to not check for ASCII validity made it work for me. Obviously, this is a bad hack and no good solution. I'll have to do more research how this bad response header value is generated, but I am not sure if I will be able to fix this server-side eventually. In my case I do not care about the header value at all. Could it make sense for surf to work with Result<HeaderValue, _> or Option<HeaderValue>, so I can just choose to ignore the invalid header value that I am not interested in but still be able to work with the body of the response?

What do you think?

Edit: After all, I managed to fix it on the server side. Now the hard part, getting it merged upstream: MDL-60697