python-trio / trio

Trio – a friendly Python library for async concurrency and I/O
https://trio.readthedocs.io
Other
6.01k stars 325 forks source link

`fail_after` deadline is set on initialization not context entry #2512

Open zanieb opened 1 year ago

zanieb commented 1 year ago

This is a duplicate of https://github.com/agronholm/anyio/issues/514 — they are following the convention established here in trio.

When using a fail_after context, the deadline is set at "initialization" time rather than __enter__. This behavior can result in unexpected bugs if the context is declared before it is entered. This is not particularly intuitive for a Python context manager — I'd expect the timer to start when the context is entered. I do not think changing it would be complex, but it could be considered breaking.

import trio

async def main():
    ctx = trio.fail_after(5)
    await trio.sleep(5)
    with ctx:
        for i in range(1, 6):
            print(i)
            await trio.sleep(1)

trio.run(main)
❯ python example.py  
1
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/orion-dev-39/lib/python3.9/site-packages/trio/_timeouts.py", line 106, in fail_at
    yield scope
  File "/Users/mz/dev/prefect/example.py", line 168, in main
    await trio.sleep(1)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/orion-dev-39/lib/python3.9/site-packages/trio/_timeouts.py", line 76, in sleep
    await sleep_until(trio.current_time() + seconds)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/orion-dev-39/lib/python3.9/site-packages/trio/_timeouts.py", line 57, in sleep_until
    await sleep_forever()
  File "/opt/homebrew/Caskroom/miniconda/base/envs/orion-dev-39/lib/python3.9/site-packages/trio/_timeouts.py", line 40, in sleep_forever
    await trio.lowlevel.wait_task_rescheduled(lambda _: trio.lowlevel.Abort.SUCCEEDED)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/orion-dev-39/lib/python3.9/site-packages/trio/_core/_traps.py", line 166, in wait_task_rescheduled
    return (await _async_yield(WaitTaskRescheduled(abort_func))).unwrap()
  File "/opt/homebrew/Caskroom/miniconda/base/envs/orion-dev-39/lib/python3.9/site-packages/outcome/_impl.py", line 138, in unwrap
    raise captured_error
  File "/opt/homebrew/Caskroom/miniconda/base/envs/orion-dev-39/lib/python3.9/site-packages/trio/_core/_run.py", line 1222, in raise_cancel
    raise Cancelled._create()
trio.Cancelled: Cancelled

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/mz/dev/prefect/example.py", line 171, in <module>
    trio.run(main)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/orion-dev-39/lib/python3.9/site-packages/trio/_core/_run.py", line 2010, in run
    raise runner.main_task_outcome.error
  File "/Users/mz/dev/prefect/example.py", line 168, in main
    await trio.sleep(1)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/orion-dev-39/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/orion-dev-39/lib/python3.9/site-packages/trio/_timeouts.py", line 108, in fail_at
    raise TooSlowError
trio.TooSlowError
Fuyukai commented 1 year ago

So don't do that? I fail to see any good design where creating a context manager and then not using it would make sense.

zanieb commented 1 year ago

Often, we'll create a context manager ahead of time (during a configuration step, for example) then downstream enter the context. In my application, we are using fail_after to enforce configurable timeouts. We set up the context but do not want to enter it until right before we call the (library user's) code that needs a timeout. At other times, when there is not a timeout, we'll use a null context to simplify downstream logic.

If we weren't ever using the context manager, this wouldn't be an issue. The issue here is that the default behavior is antithetical to typical context manager behavior. It's like having the following return the amount of time since initialization rather than within the context:

timer = timer()
with timer:
   ...

It's also not an atypical design to use things like an AsyncExitStack — in this case developers may create the contexts beforehand then enter them in a loop.

agronholm commented 1 year ago

Trio would have to change their semantics from what has been documented, and so far there haven't been any practical use cases that would warrant this.

zanieb commented 1 year ago

@agronholm can you link to the documentation that covers this behavior? I did not see any explicit mention of this.

agronholm commented 1 year ago

Here: https://trio.readthedocs.io/en/stable/reference-core.html#trio.fail_after

Creates a cancel scope with the given timeout, and raises an error if it is actually cancelled.

Implies that the cancel scope gets created with a set deadline. Says nothing about the deadline getting set when entering the scope.

zanieb commented 1 year ago

Ah I think that can be interpreted either way based on your assumptions.. the example shows with fail_after() as cancel_scope so it's pretty ambiguous when that scope is created — it could just as well be created on __enter__ and yielded. In fact, you cannot get the CancelScope object itself until after entering the context (in anyio).

agronholm commented 1 year ago

As there are 0 examples of separating the instantiation of this context manager from its use in a with statement in the documentation of either trio or anyio, I think it would be safe to assume that it was not a supported use case.

zanieb commented 1 year ago

If it was not a supported use-case then what is the objection to changing the behavior to be clearer?

zanieb commented 1 year ago

To be clear: I can work around this and will need to do so for compatibility with older versions of anyio/trio no matter what is decided here. I will likely add my own utility with the behavior I need — I am only advocating for an API that is less likely to be misleading here.

agronholm commented 1 year ago

If it was not a supported use-case then what is the objection to changing the behavior to be clearer?

If anybody anywhere is already using the context manager in such a disjointed manner correctly, their code will be broken. I wouldn't expect anybody to be doing that, but then again they wouldn't report that to us unless something wasn't working. I also didn't expect anybody to be using the cm like this at all, so who knows?

agronholm commented 1 year ago

I think a change in semantics should be considered but only if someone brings forth a plausible use case where doing this is necessary or at least beneficial.

zanieb commented 1 year ago

Yeah unfortunately timeouts are kind of fuzzy in the first place — we had it like this for a long time before we got a bug report from a user. From a cursory search on GitHub, I don't see any uses apart from ours where the initialization and entry are separated. I'll open a pull request to document this caveat. We can see if more people chime in on this issue, I see no reason to rush an API change.

njsmith commented 1 year ago

I don't think there's any problem necessarily with calculating the deadline at the last moment, though I can imagine someone being confused the other way as well. ("I created the fail_after object early because I wanted that to be when the deadline was relative to.")

BTW, re:

In my application, we are using fail_after to enforce configurable timeouts. We set up the context but do not want to enter it until right before we call the (library user's) code that needs a timeout. At other times, when there is not a timeout, we'll use a null context to simplify downstream logic.

Trio accepts math.inf as a timeout everywhere, meaning "never timeout". So you could pass the bare timeout value around everywhere (either the configured timeout, or math.inf if there's no timeout), and then unconditionally call fail_after around the operation that has the timeout. Might be easier to understand, independently of whatever happens with this discussion?

smurfix commented 1 year ago

I'd vote +0.25 for changing the behavior (but +1 for documenting things more stringently).

("I created the fail_after object early because I wanted that to be when the deadline was relative to."

You can easily calculate an absolute timeout at that time instead.

The way things currently are, you can't achieve the behavior @madkinsz wants as easily, thus this'd be a net win for that kind of flexibility in case anybody ever wants it.

zanieb commented 1 year ago

Thanks for the additional responses!

Trio accepts math.inf as a timeout everywhere, meaning "never timeout". So you could pass the bare timeout value around everywhere (either the configured timeout, or math.inf if there's no timeout), and then unconditionally call fail_after around the operation that has the timeout. Might be easier to understand, independently of whatever happens with this discussion?

Definitely! I plan on doing this in the short term.

For more context: I'm also working on a refactor of how we're orchestrating user code to reduce the duplication of some complex portions of code and something I'm exploring is passing a list of contexts to a function that uses an exit stack to enter them all. This separation of configuration of the contexts that should be active during execution of the user code and management of actual execution of the code greatly simplifies our runtime logic. Of course, I can make this accept Callable[[], ContextManager] too to get past this particular issue, but then for more complex context managers I'll be passing around a bunch of partial objects 🤷‍♀️ tradeoffs.

Zac-HD commented 1 year ago

I think it would make sense for trio.fail_after(x) to be relative to entering the context manager; trio.fail_at(trio.current_time() + x) seems a more natural way to express the relative-to-construction-time semantics. (and the same for trio.move_on_{at,after})

Either way, we should document the chosen semantics.

jakkdl commented 1 month ago

Concretely thinking about possible ways of going about implementing a change in behaviour (sorry if somewhat rambly/stream-of-thought, I think I personally prefer the new class that transforms into a cancelscope)

new attribute

One option would be introducing an attribute, relative, to CancelScope - such that when CancelScope.__enter__ is called it does

if self.relative: self._deadline = trio.current_time() + self._deadline

Could also add an attribute relative_deadline... but that brings up problems with what should happen if you access the deadline attribute.

new class that transforms into CancelScope

We can introduce a new class (PendingTimeout, PendingRelativeCancelScope, RelativeCancelScope...?) which when __enter__ed returns a CancelScope. It does need to be exposed for the sake of typing, and you probably want to be able to modify the timeout after creation, before entering. It probably makes sense to also expose shield.

There's still some danger of deadline confusion if writing:

my_timeout = fail_after(5)
with my_timeout as timeout:
    # oh no, we want 7 seconds, not 5
    my_timeout.deadline = 7
    # ...except that's not how cancelscopes work

although I guess that has always been the case. If trying to access deadline on this object you will get an error (since it's not defined as an attribute at all), so some of the code relying on current behavior will get caught this way, and still others will get caught by typing errors.

example usage

my_timeout: PendingTimeout = fail_after(5)
my_timeout.shield = True
my_timeout.seconds = 7
my_timeout.fail = False # this is now a move_on_after instead
with pytest.raises(
    AttributeError,
    match="'PendingTimeout' object has no attribute 'deadline'"
    ):
  my_timeout.deadline += 2
cs: CancelScope
with my_timeout as cs:
  with pytest.raises(
      AttributeError,
      match="'CancelScope' object has no attribute 'seconds'"
  ):
    cs.seconds += 2

# this
with trio.PendingTimeout(5, shield=True, fail = False):
  ...
# would be equivalent to
with trio.move_on_after(5) as cs:
  cs.shield = True
  ...

new subclass

A third option would be introducing a subclass of CancelScope, RelativeCancelScope, which has an attribute seconds (or timeout? The parameter to fail_ & move_on_after is currently called seconds). Trying to get/set deadline on the object before entering will raise an error (which means code relying on current functionality will get a helpful error mesage), but seconds can be modified freely. Once entered though, if modifying seconds the change will flow down to the underlying CancelScope, and if modifying deadline the seconds parameter will also be updated. Code using current functionality will not get any typing errors, since RelativeCancelScope is a subclass of CancelScope.

example usage

my_timeout: RelativeCancelScope = fail_after(9)
with pytest.raises(
    AttributeError, # or some other error
    match='cannot set deadline on RelativeCancelScope before entering.'
  ):
  my_timeout.deadline += 3
my_timeout.seconds = 5

with my_timeout: # __enter__ returns Self, so don't even need to do `as`
  # assume that trio.current_time() was 100 upon entering
  assert my_timeout.deadline == 105
  my_timeout.seconds = 20
  assert my_timeout.deadline == 120
  my_timeout.deadline = 150
  assert my_timeout.seconds = 50
  my_timeout.deadline = 97 # nothing illegal about setting deadlines in the past on cancelscopes
  assert my_timeout.seconds = -3

I'm not sure how much this actually differs from adding a new attribute tbh.

a new, entirely independent, class

could also go all the way to introducing a fully new RelativeCancelScope that does not inherit from CancelScope. But that seems overkill.

CoolCat467 commented 1 month ago

I think a new subclass would work well. On the seconds side of things, in the past I've wished it were measured in smaller time increments, but I think it's probably fine to leave as seconds. The division operation to make things into seconds isn't really going to be that bad.

Zac-HD commented 1 month ago

I favor adding a ._relative_deadline attribute to the existing CancelScope class.

Modifying the deadline or relative deadline before entering seems fine, though I'm still using a leading underscore to discourage the latter. We could make the relative deadline a property so that attempting to modify it after entering checks if self._has_been_entered and raises an error.

TeamSpen210 commented 1 month ago

After being entered, modifying either the absolute or relative deadline seems reasonable to me - it'd just implicitly fetch the current time then set the other to match.

jakkdl commented 4 weeks ago

Finishing up the PR, when I realized that the behaviour of relative_deadline after entering is not obvious.

with CancelScope(relative_deadline = 5) as cs:
  await trio.sleep(1)
  print(cs.relative_deadline)  # should this print 5, or 4?
  cs.relative_deadline = 3 # does this mean "deadline in 3 seconds", or "3 seconds after entering"?

My first intuition (and implementation) was that it would print 5, and then "deadline in 3 seconds" - but that means that the value of cs.relative_deadline becomes ~worthless:

with CancelScope(relative_deadline = 5) as cs:
  await trio.sleep(random.randint(3))
  cs.relative_deadline = 1 + random.randint(3)
  await trio.sleep(random.randint(3))
  print(cs.relative_deadline)  # this value now has zero relation to when the scope will time out.

Possible alternatives:

  1. accessing the relative_deadline property after entering will return deadline - current_time(). The answers to the first questions are "4 seconds" and "deadline in 3 seconds".
  2. Add another attribute _time_of_entering to the cs, set to current_time() upon entering. All relative times are now relative to entering the scope. The answer to the first questions are "5 seconds" and "3 seconds after entering".
  3. Raise an error if accessing or setting relative_deadline after entering to remove a potential footgun. The 1st option doesn't actually add any functionality after all, cs.relative_deadline = x is equivalent to cs.deadline = trio.current_time() + x and cs.relative_deadline is equivalent to cs.deadline - trio.current_time().

I think the first one might be more intuitive and useful? But can see a case for any of the options. I'll push to #3010 soonish with an implementation.