http-rs / surf

Fast and friendly HTTP client framework for async Rust
https://docs.rs/surf
Apache License 2.0
1.45k stars 119 forks source link

Truncated body when making a GET request on Apple M1 #270

Closed ElliotSis closed 3 years ago

ElliotSis commented 3 years ago

The following code, built with beta-aarch64-apple-darwin unchanged - rustc 1.49.0-beta.2 (bd26e4e54 2020-11-24):

fn main() {
    println!(
        "received from surf:{}",
        futures::executor::block_on(surf::get("https://www.google.com").recv_string()).unwrap().len()
    );
    println!(
        "received from reqwest:{}",
        reqwest::blocking::get("https://www.google.com")
            .unwrap()
            .text()
            .unwrap()
            .len()
    );
}

Outputs the following (MacBook Air M1, 2020):

received from surf:5855
received from reqwest:48607

(The body is only received partially when using surf, whereas reqwest returns the intended body). This is not reproducible on x86 (linux), on the same rustc version from beta channel.

Am I missing something here or is this some platform-specific issue?

jbr commented 3 years ago

Does this happen when using isahc directly? Sounds like it might be a libcurl/isahc issue? Asked differently, if you switch surf's backend to h1, does that fix the issue?

ElliotSis commented 3 years ago

Hey Jacob, thanks for the quick reply!

The same issue doesn’t seem to be reproducible when using ishac::get directly. Another thing that differs between these examples is the synchronous vs. asynchronous interfaces; so maybe it’s related?

Happy to help investigate with more examples

jbr commented 3 years ago

Could you try isahc::get_async? Does switching to async-h1 fix it?

surf = { version = "2.1.0", default-features = false, features = [ "h1-client", "middleware-logger", "encoding" ] }
ElliotSis commented 3 years ago

Yes I’m able to reproduce the same issue with isahc::get_async. Using h1 does fix the issue, so it’s likely an issue with isahc when used asynchronously, I will open a bug there, thanks!

ElliotSis commented 3 years ago

Ah actually scratch that, it works with isahc::get_async as well (was running the wrong binary). It does work when using h1 with surf though.

ElliotSis commented 3 years ago

Looking at this a bit more, it looks like the response from ishac has the wrong length, but the actual content is correct. When using surf however, both the length and the content are truncated.

http_client for ishac uses Body::len() to read the response.

However, Body::len()’s documentation stipulates the following:

    /// Since the length may be determined totally separately from the actual
    /// bytes, even if a value is returned it should not be relied on as always
    /// being accurate, and should be treated as a "hint".

Could this be related?

jbr commented 3 years ago

Interesting! One way to validate that hypothesis would be to change this line: https://github.com/http-rs/http-client/blob/main/src/isahc.rs#L52

to

let body = Body::from_reader(BufReader::new(body), None);

I don't have a good sense of why this would be different on m1, but that comment definitely seems like we shouldn't be using isahc's Body::len() regardless

ElliotSis commented 3 years ago

Yup, that fixes it! I don’t really understand why that would be different on M1 either so I filed https://github.com/sagebind/isahc/issues/265 just in case it needs to be looked into.

ElliotSis commented 3 years ago

FYI, from sagebind/isahc#265

Put another way, you should use Body::len() to help inform how you read the response, and treat it as equivalent to Content-Length, but you should not use it to allocate an exact buffer size for the response without doing any other validation, as that's a good way to introduce buffer overrun vulnerabilities.

Fishrock123 commented 3 years ago

This should be fixed in http-client once we do a release: https://github.com/http-rs/http-client/pull/58

ElliotSis commented 3 years ago

Nice, thanks for the update!

Fishrock123 commented 3 years ago

Sound be fixed if you run a cargo update.