theelous3 / asks

Async requests-like httplib for python.
MIT License
508 stars 63 forks source link

Timeout does not affect socket connect call #64

Closed miracle2k closed 6 years ago

miracle2k commented 6 years ago

https://github.com/theelous3/asks/blob/2028482fdc89b8ade91831a3a8745082b3ff9d37/asks/sessions.py#L141

This line can hang, when connecting to a non-routable address (try 10.255.255.1 for example). It is not affected by the timeout argument.

miracle2k commented 6 years ago

Note that in trio this is super easy to fix since I can just wrap my request within my own trio.fail_after`.

dd-dent commented 6 years ago

I'm thinking maybe modify the timeout_manager function to handle sockets as well?
@miracle2k Are you currently working on it? Anyone else? I can take over it myself otherwise.

theelous3 commented 6 years ago

Things like this are funny. On one hand, it's trivial to implement, and on the other hand we need to have it behave in a way that is obvious, logical, and intuitive.

I initially intentionally left the connection out of the timeout management, as I think it's not actually really expected to include it on a high level. For example, calling something that takes a defined about of time. Easy example is httpbin's /delay endpoint.

Even as it stands now, thanks to network io, if we call /delay/1 with a timeout of 1, it will timeout every single time. This isn't really black magic stuff though. Users know that network io is at play and can account for that. So for /delay/1 a timeout of 1.5 seconds may be reasonable for them, and set.

What is less obvious though, is that different servers have different concurrency capabilities and different connection header policies, and may be slow to allot them a socket or whatever. Now we have a total operation of say, 5 seconds to run the /delay/1 endpoint the first time, so it fails and they adjust to 5.5 seconds. Then the connection is pooled for them and the server uses connection: keep-alive, so all of their subsequent requests are waiting four seconds too long.

Ideally I'd like to avoid complexity, but I think in the end it may make sense to have an additional connection_timeout kwarg with a default, that people can go and fine tune if needs be.

How about a default of 60 seconds, half the the maximum (segment lifetime) time a packet can float around the interwebs?

https://en.wikipedia.org/wiki/Maximum_segment_lifetime

miracle2k commented 6 years ago

I'm not working on it! I just wanted to mention this because I ran into it, but for me, the trio timeout works fine for now.

dd-dent commented 6 years ago

From requests:

:param timeout: (optional) How long to wait for the server to send data before giving up, as a float, or a :ref:(connect timeout, read timeout) <timeouts> tuple.
and then from urllib3:

:param total: This combines the connect and read timeouts into one; the read timeout will be set to the time leftover from the connect attempt. In the event that both a connect timeout and a total are specified, or a read timeout and a total are specified, the shorter timeout will be applied. Defaults to None.

So what @theelous3 said...
I'll hopefully get it done by this weekend.

jmehnle commented 6 years ago

I have a need for this as well. Let me know if there's anything I can do to help.

theelous3 commented 6 years ago

Fixed, pypi'd. https://github.com/theelous3/asks/commit/ced7a74a03c1811e0811ce4f2ce3a0dc8363beda