Closed fhsgoncalves closed 4 years ago
This is just due to the racy nature of networking.
hyper has a connection pool of idle connections, and it selected one to send your request. Most of the time, hyper will receive the server's FIN
and drop the dead connection from its pool. But occasionally, a connection will be selected from the pool and written to at the same time the server is deciding to close the connection. Since hyper already wrote some of the request, it can't really retry it automatically on a new connection, since the server may have acted already.
Hey, thank you for the swift response!
I got it! So the connection is being reused, right? It is due the keep-alive option? If it is, disabling this flag, or performing a retry on the app side should solve the issue?
Also, I could not reproduce the error when requesting a url over http
. I tried a lot of times, without success, I could only reproduce the issue when requesting a url over https
.
If that is the reason, I should experienced the issue when using http
too, right?
I just found that aws s3 has a default max idle timeout of 20s, and hyper's default keep_alive_timeout is 90s.
Settings the keep_alive_timeout to less than 20s on hyper client seems to have solved the problem!
Thank you, your explanation really help me to understand why this was happening!
I was looking at the java aws client, and I saw that they use the max-idle-timeout as 60s, but there is a second property called validate-after-inactivity (5s default) that allows the idle timeout be so high. Looking at the code, I saw that the http client they use supports this behavior.
It would be possible to implement the same behavior on hyper? Does it make sense? :smile:
I believe the "revalidation" it does is to poll that it is readable. In hyper, we already register for when the OS discovers the connection has hung up. The race would still exist, if the "revalidation" happened at the same time the server was closing.
Anyone getting this with reqwest
, try this:
let client = reqwest::Client::builder()
.pool_max_idle_per_host(0)
.build()?;
Well, I tried to use .pool_max_idle_per_host(0)
and I still got this error today.
Doesn't hyper
take into account the Keep-Alive
header?
I've faced this problem with ClickHouse HTTP crate, however, ClickHouse sends Keep-Alive: timeout=3
, so I don't understand, why hyper
doesn't handle it.
@seanmonstar, any ideas?
I think it's unexpected for most people that this isn't automatically retried? If i ask the library to get a website for me i expect it to not fail because the keep alive timed out. If it should use keepalive by default it should also be able to handle it properly?
Am I misunderstanding anything here? I think closing this as completed is misleading :-)
it should also be able to handle it properly?
It's just not possible conceptually, is it? See https://github.com/hyperium/hyper/issues/2136#issuecomment-589345238
Rather the application developer has to decide if the request should get retried, e.g. if it's a well behaving GET request or if it's visible on the application level that the request did or did not have the desired effect yet.
Side note: There seem to be some weird servers that silently time out connections, meaning that they do not close the connection when their timeout is reached but unconditionally close it as soon as they get reused later. While you can counteract that with a suitable pool_idle_timeout
, I think it would be possible for the client to trigger that behavior before sending an actual request. It would still be racy as any connection if it does not trigger, though.
Hey, I'm experiencing a weird behavior with the hyper client when using https. Sometimes my app in production fails to perform the request, but the same request works most of the time. I performed a load test locally to try to reproduce the problem, and I could reproduce: it is occurring ~0.02% of the times.
I guess that it could be something related to the hyper-tls, so I switched to hyper-rustls, but the same problem continue to occur. So I tried to hit the url using http instead of https and the error went away!
The error I receive from hyper::Client::get is:
hyper::Error(IncompleteMessage): connection closed before message completed.
Follow a minimal working example to reproduce the error:
Cargo.toml:
src/main.rs:
PS: replace the
url
value with a validhttps
url. In my tests I used a small file on aws s3.I performed a local load test using hey:
Running the test for 2 minutes (
-z 120s
) was enough to see some errors appearing.Could anyone help me out? If I need to provide more information, or anything, just let me know. Thank you!