sbstp / attohttpc

Rust lightweight HTTP 1.1 client
https://docs.rs/attohttpc/
Mozilla Public License 2.0
258 stars 24 forks source link

Timeout not working on Windows #118

Closed lucasfernog closed 2 years ago

lucasfernog commented 2 years ago

I'm making a request with timeout set to Duration::from_secs(2) to a dummy localhost server. The server sleeps for Duration::from_secs(5) before responding, and the timeout is not enforced on Windows (though it works on macOS and Linux).

adamreichold commented 2 years ago

I suspect that shutdown does not unblock pending reads/writes on the socket on Windows. I think we might need to call close on Windows instead. (Probably easiest to call close additionally?)

adamreichold commented 2 years ago

Since we do drop the stream handle (which should close it), this might also be the "linger" duration elapsing in addition to the timeout. (Implying that we would need to disable this or at least take it into account.)

Did you try extending the sleep in the server? Does the request fail eventually?

lucasfernog commented 2 years ago

With the timeout set to 2 seconds and the server sleeping for a minute, the request fails after 30 seconds:

A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (os error 10060): Io Error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (os error 10060)
adamreichold commented 2 years ago

From the error message, this does not look like the linger timeout. It actually looks like the connection is not yet established?

The request timeout (due to being implemented by shutting down the connection) only starts after the connection is established, whereas the time until then is covered by the connection timeout (which defaults to 30 seconds). This distinction is admittedly a technical artefact of the implementation, but could you try if reducing the connection timeout to two seconds "fixes" this?

If so, we could include the request timeout when computing the connection timeout and also including the time it took to establish the connection when computing the deadline for the request timeout to make this more consistent.

adamreichold commented 2 years ago

(Even though I do not understand why this would be a Windows-only issue then...)

lucasfernog commented 2 years ago

Setting the connection timeout does fix it: error sending request for url (http://localhost:3003/): operation timed out: error sending request for url (http://localhost:3003/): operation timed out: operation timed out

adamreichold commented 2 years ago

If so, we could include the request timeout when computing the connection timeout and also including the time it took to establish the connection when computing the deadline for the request timeout to make this more consistent.

This is somewhat embarrassing as I was involved in writing that code, but we already seem to do that:

https://github.com/sbstp/attohttpc/blob/67ff381c02e3e23ba1cdfb2fe4c0b2dd9d257508/src/happy.rs#L65-L69

So if a deadline (i.e. full request timeout) is set, we use that to compute a connect timeout and use whatever is smaller.

Sorry if this is getting into the weeds, but I would have a hard time testing this on Windows, so could you slap dbg! invocations around timeout and timeout.min(timeout1) passed to connect_timeout so we can see what changes with and without a reduced connection timeout in addition to the full request timeout? Thank you!

lucasfernog commented 2 years ago

I'll do that later today or tomorrow.

lucasfernog commented 2 years ago

After some debugging it seems like the problem is in https://github.com/sbstp/attohttpc/blob/67ff381c02e3e23ba1cdfb2fe4c0b2dd9d257508/src/streams.rs#L114 it is not accounting the info.deadline (timeout for the whole request). @adamreichold

lucasfernog commented 2 years ago

Connection finishes in 1ms, but the read is using the default read_timeout (30s) so the request succeds even though the server takes 5 seconds to respond. We should get the min value of read_timeout and the deadline IMO.

adamreichold commented 2 years ago

@lucasfernog I think do not really understand the issue yet. After the connection is established, the request timeout should automatically override any other timeout (read, write, etc.) as we forcibly shut down the socket thereby interrupting any blocked read or write operation independently of any other timeouts.

The connection timeout is special only because we do not have access to the socket until connect_timeout returns, so we cannot start the separate thread which closes the socket that we do not yet know.

lucasfernog commented 2 years ago

I'm debugging the issue, the connect_timeout is actually returned and the thread is started. stream.shutdown is called, but the stream read does not return Ok(0) immediately as stated by the docs. I actually get to the Ok(read) match when the server responds (after the 5 seconds of sleep).

adamreichold commented 2 years ago

While debugging, could you check the return value of stream.shutdown, i.e. if it returns an error?

adamreichold commented 2 years ago

One other idea we might consider is that SO_LINGER interacts with this on Windows as described here. Could you try using the (currently unstable) TcpStream::set_linger method to set a None value?

lucasfernog commented 2 years ago

Shutdown doesn't return an error. I'll check the unstable API.

lucasfernog commented 2 years ago

@adamreichold with set_linger I received the timeout error, but it's not immediate, it took the whole 5 seconds to receive it. I guess this is a Windows problem? I wonder how reqwest handles this since it works there.

adamreichold commented 2 years ago

I guess this is a Windows problem? I wonder how reqwest handles this since it works there.

reqwest is always "async" internally which makes the timeout just another source of I/O events (e.g. via tokio's timer wheel). When a timeout triggers, the future representing the read is just dropped without the need to unblock/cancel/interrupt any blocking system calls.

Using shutdown to unblock blocking system calls is admittedly a cheap trick that works nicely on Unix-like systems. I think the only alternative besides going all out async (and not doing this is sort of the selling point of attohttpc) is updating the timeout before each and every call to read and write based on the request deadline which would complicate the code and introduce unnecessary performance costs on Unix-like platforms.

adamreichold commented 2 years ago

(I suspect we might be able to do something Windows-specific to make that trick work, like calling closesocket or some other WSA entry point directly.)

lucasfernog commented 2 years ago

Makes sense. On my side I can also just set read_timeout along timeout and it'll work.

adamreichold commented 2 years ago

I think I finally have a clue what is happening: On UNIX, shutdown unblocks everything whereas on Windows only closesocket would do. By dropping the stream, we do (eventually call closesocket](https://github.com/rust-lang/rust/blob/master/library/std/src/os/windows/io/socket.rs#L186-L193), but it is a duplicate socket produced using WSADuplicateSocketW about which the docs say:

A process can call closesocket on a duplicated socket and the descriptor will become deallocated. The underlying socket, however, will remain open until closesocket is called by the last remaining descriptor.

(From that doc page, try_clone sounds significantly more complex on Windows due to the focus of WSADuplicateSocketW on IPC.)

So I think with that knowledge, we should avoid try_clone, but use as_raw_fd/socket and from_raw_fd/socket to "duplicate" the socket in the timeout thread so that the closeoscket call actually affects the original socket descriptor.

adamreichold commented 2 years ago

@lucasfernog I have created https://github.com/sbstp/attohttpc/pull/119 implementing this idea, but I cannot test this using Wine (as the initial symptom is not present, most likely because Wine maps the Windows socket API to Linux' native one).

adamreichold commented 2 years ago

@lucasfernog Any chance you could give #119 a try on a Windows machine? I fear that otherwise I will not be able to make any progress on this as I do not have ready access to a Windows machine.

lucasfernog commented 2 years ago

@adamreichold I can check that, sorry it took so long, I've been super busy.