Open joernheissler opened 6 years ago
On 28.09.2018 18:28, Jörn Heissler wrote:
|TooSlowError| is especially strange because the timeout wasn't reached.
The problem is that fail_after
cannot discover that it was cancelled
manually by yourself instead of the timeout.
Fix: create a secondary scope inside the fail_after
, and cancel that.
-- -- Matthias Urlichs
Thanks! But is that a fix or is it a workaround? :-)
Likely no one has ever thought to manually cancel a fail_after(). Would you describe your use case?
Behavior seems completely in line with docs, though I understand not what you're wanting:
Creates a cancel scope with the given timeout, and raises an error if it is actually cancelled.
This function and move_on_after() are similar in that both create a cancel scope with a given timeout, and if the timeout expires then both will cause Cancelled to be raised within the scope. The difference is that when the Cancelled exception reaches move_on_after(), it’s caught and discarded. When it reaches fail_after(), then it’s caught and TooSlowError is raised in its place.
Raises TooSlowError – if a Cancelled exception is raised in this scope and caught by the context manager
There are two things that can be cancelled: 1) the cancel scope, 2) the notion that an exception should be raised. The cancel() method can only be bound to one, and since it's a method on the CancelScope object it understandably is the former.
Possibly need a ~cancel_fail_after()
method~ new property (see below).
Once we head down this road, I think it should be possible to dynamically transition between plain cancel scope, move_on_after, and fail_after semantics while the cancel scope is in flight. It's already possible to do that with the first two via deadline
. There should be an additional raise_on_cancelled
read/write property which is a bool, or perhaps an exception object.
Likely no one has ever thought to manually cancel a fail_after(). Would you describe your use case?
In a nutshell: There's a long running task (might run infinitely). When a certain event happens, I want to finish it without error, so I'd cancel it. If the event does not occur within N seconds, I want the task to abort with an error.
That task receives messages from a stream and runs a callback for each message. The callback might then cancel the task.
My code and design are flawed and I'm going to rewrite it without using callbacks. Then the above use case won't exist anymore. I once learned that I need to use state machines and callbacks, unlearning this is hard.
Creates a cancel scope with the given timeout, and raises an error if it is actually cancelled.
I must have missed this. So it looks like the behaviour does match the docs.
Still I find it strange that cancelling one CancelScope
causes it to return silently while cancelling another CancelScope
causes it to fail with TooSlowError
while in fact nothing was too slow.
Huh, interesting case.
So in general, trio doesn't track the reason why a scope was cancelled – a timeout is exactly the same as calling cancel
. This is because of you do try to track the difference then there are all kinds of ambiguous cases, e.g. what do you do if it timed out and then someone called cancel
as well, or even vice-versa, what if the timeout expires while it's in the process of handling the manual cancel
.
And fail_after
is an extremely simple wrapper around a cancel scope: https://github.com/python-trio/trio/blob/master/trio/_timeouts.py
But yeah the end result is kind of surprising here!
Some options:
Leave the code as is, but document it better
Remove fail_*
entirely; when people ask about them treat it as an opportunity to teach about the wonders of .cancelled_caught
Start tracking cancel scope cancellation reason, maybe just in a minimal way like "was the first call to cancel
from a timeout or not"
Have fail_after
check the current time at the moment when it decides whether to raise TooSlowError
, and skip it if the timeout couldn't possibly have expired
Hmm, above message was written this morning and then I forgot to hit post, so the discussion has overtaken it a bit.
I have trouble telling how useful fail_after
really is, once you get used to doing things "the trio way". In other systems that don't have nestable cancel scopes, like asyncio, fail_after
is kind of the only way to do things, so I partly just threw it in to make new migrants feel more at home. But that's probably not the only use case? It's hard to tell. We're all unlearning the old ways and trying to invent the new ways together :-).
If you're rewriting your code anyway, then maybe we should leave this be for the moment, and just keep this thread in mind in case it comes up again in the future?
My use case (close enough): I'm connecting to a host and my connection-setup handshake is protected by a fail_after
because, well, if that takes too long the host isn't useable. You can imagine what happens when the whole subtask is cancelled for completely unrelated reasons. I don't see an obviously-correct way to handle that case.
So, yes, I'm all for some way to fix this issue, preferably one that doesn't require two additional syscalls to gettimeinfo
or similar.
AFAICT, this issue only applies to cancelling the fail_after
scope itself. Cancelling a scope that encloses the fail_after
should work fine. The Happy Eyeballs example you mention seems to me like it would be of the latter species.
Are you seeing that cancelling a scope that encloses the fail_after
results in a TooSlowError
? That would definitely be a bug, and code to reproduce it would be appreciated.
Yes, that's what I'm seeing. I don't have a test case (a trivial attempt to reproduce this failed) but in my code the fail_after
's scope is not even saved.
Are you using a released Trio, or master? There were some recent cancellation changes that could theoretically have bugs... I'm not able to reproduce the outer-cancellation-causes-inner-TooSlowError on master with a simple case either, though.
I've got some code that uses
fail_after
and cancels itself, example below. My assumption was that the context manager just exits without any error. Yet it raises aCancelled
andTooSlowError
. When I don't usefail_after
butopen_cancel_scope
, things work as expected. I can't find anything in the docs that explain the difference. And I can't think of any good reason either.TooSlowError
is especially strange because the timeout wasn't reached.Version: ce131cd519a4ac3cb936ffb9e34e0a4f7e70bc24 on cpython3.6.6
Result is: