Closed lucasfernog closed 2 years ago
I suspect that shutdown
does not unblock pending reads/writes on the socket on Windows. I think we might need to call close
on Windows instead. (Probably easiest to call close
additionally?)
Since we do drop the stream handle (which should close it), this might also be the "linger" duration elapsing in addition to the timeout. (Implying that we would need to disable this or at least take it into account.)
Did you try extending the sleep in the server? Does the request fail eventually?
With the timeout set to 2 seconds and the server sleeping for a minute, the request fails after 30 seconds:
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (os error 10060): Io Error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (os error 10060)
From the error message, this does not look like the linger timeout. It actually looks like the connection is not yet established?
The request timeout (due to being implemented by shutting down the connection) only starts after the connection is established, whereas the time until then is covered by the connection timeout (which defaults to 30 seconds). This distinction is admittedly a technical artefact of the implementation, but could you try if reducing the connection timeout to two seconds "fixes" this?
If so, we could include the request timeout when computing the connection timeout and also including the time it took to establish the connection when computing the deadline for the request timeout to make this more consistent.
(Even though I do not understand why this would be a Windows-only issue then...)
Setting the connection timeout does fix it: error sending request for url (http://localhost:3003/): operation timed out: error sending request for url (http://localhost:3003/): operation timed out: operation timed out
If so, we could include the request timeout when computing the connection timeout and also including the time it took to establish the connection when computing the deadline for the request timeout to make this more consistent.
This is somewhat embarrassing as I was involved in writing that code, but we already seem to do that:
So if a deadline (i.e. full request timeout) is set, we use that to compute a connect timeout and use whatever is smaller.
Sorry if this is getting into the weeds, but I would have a hard time testing this on Windows, so could you slap dbg!
invocations around timeout
and timeout.min(timeout1)
passed to connect_timeout
so we can see what changes with and without a reduced connection timeout in addition to the full request timeout? Thank you!
I'll do that later today or tomorrow.
After some debugging it seems like the problem is in https://github.com/sbstp/attohttpc/blob/67ff381c02e3e23ba1cdfb2fe4c0b2dd9d257508/src/streams.rs#L114 it is not accounting the info.deadline
(timeout for the whole request). @adamreichold
Connection finishes in 1ms, but the read is using the default read_timeout (30s) so the request succeds even though the server takes 5 seconds to respond. We should get the min value of read_timeout and the deadline IMO.
@lucasfernog I think do not really understand the issue yet. After the connection is established, the request timeout should automatically override any other timeout (read, write, etc.) as we forcibly shut down the socket thereby interrupting any blocked read or write operation independently of any other timeouts.
The connection timeout is special only because we do not have access to the socket until connect_timeout
returns, so we cannot start the separate thread which closes the socket that we do not yet know.
I'm debugging the issue, the connect_timeout is actually returned and the thread is started. stream.shutdown
is called, but the stream read does not return Ok(0)
immediately as stated by the docs. I actually get to the Ok(read)
match when the server responds (after the 5 seconds of sleep).
While debugging, could you check the return value of stream.shutdown
, i.e. if it returns an error?
One other idea we might consider is that SO_LINGER
interacts with this on Windows as described here. Could you try using the (currently unstable) TcpStream::set_linger
method to set a None
value?
Shutdown doesn't return an error. I'll check the unstable API.
@adamreichold with set_linger
I received the timeout error, but it's not immediate, it took the whole 5 seconds to receive it. I guess this is a Windows problem? I wonder how reqwest handles this since it works there.
I guess this is a Windows problem? I wonder how reqwest handles this since it works there.
reqwest
is always "async" internally which makes the timeout just another source of I/O events (e.g. via tokio
's timer wheel). When a timeout triggers, the future representing the read is just dropped without the need to unblock/cancel/interrupt any blocking system calls.
Using shutdown
to unblock blocking system calls is admittedly a cheap trick that works nicely on Unix-like systems. I think the only alternative besides going all out async (and not doing this is sort of the selling point of attohttpc
) is updating the timeout before each and every call to read and write based on the request deadline which would complicate the code and introduce unnecessary performance costs on Unix-like platforms.
(I suspect we might be able to do something Windows-specific to make that trick work, like calling closesocket
or some other WSA entry point directly.)
Makes sense. On my side I can also just set read_timeout
along timeout
and it'll work.
I think I finally have a clue what is happening: On UNIX, shutdown
unblocks everything whereas on Windows only closesocket
would do. By dropping the stream, we do (eventually call closesocket
](https://github.com/rust-lang/rust/blob/master/library/std/src/os/windows/io/socket.rs#L186-L193), but it is a duplicate socket produced using WSADuplicateSocketW
about which the docs say:
A process can call closesocket on a duplicated socket and the descriptor will become deallocated. The underlying socket, however, will remain open until closesocket is called by the last remaining descriptor.
(From that doc page, try_clone
sounds significantly more complex on Windows due to the focus of WSADuplicateSocketW
on IPC.)
So I think with that knowledge, we should avoid try_clone
, but use as_raw_fd/socket
and from_raw_fd/socket
to "duplicate" the socket in the timeout thread so that the closeoscket
call actually affects the original socket descriptor.
@lucasfernog I have created https://github.com/sbstp/attohttpc/pull/119 implementing this idea, but I cannot test this using Wine (as the initial symptom is not present, most likely because Wine maps the Windows socket API to Linux' native one).
@lucasfernog Any chance you could give #119 a try on a Windows machine? I fear that otherwise I will not be able to make any progress on this as I do not have ready access to a Windows machine.
@adamreichold I can check that, sorry it took so long, I've been super busy.
I'm making a request with timeout set to
Duration::from_secs(2)
to a dummy localhost server. The server sleeps forDuration::from_secs(5)
before responding, and the timeout is not enforced on Windows (though it works on macOS and Linux).