typelevel / cats-effect

The pure asynchronous runtime for Scala
https://typelevel.org/cats-effect/
Apache License 2.0
2.03k stars 521 forks source link

Timeout on a canceled IO #3396

Closed Jasper-M closed 1 year ago

Jasper-M commented 1 year ago

If you set a timeout on a task that somehow cancels itself, the end of the timeout will be awaited instead of handing back control immediately. Even more surprising (to me) is that timeoutAndForget never hands back control in that case.

scala> IO.canceled.unsafeRunTimed(4.seconds) // errors immediately
java.util.concurrent.CancellationException: The fiber was canceled

scala> IO.canceled.timeout(3.seconds).unsafeRunTimed(4.seconds) // errors after 3 seconds
java.util.concurrent.TimeoutException: 3 seconds

scala> IO.canceled.timeoutAndForget(3.seconds).unsafeRunTimed(4.seconds) // returns after 4 seconds
val res63: Option[Unit] = None

In order to avoid this I guess you would have to surface the cancellation as an Exception.

armanbilge commented 1 year ago

Thanks. This reminds me of my proposal in https://github.com/typelevel/cats-effect/pull/3233#issue-1439117768 that .timeout and friends should be implemented in terms of a "biased race", because one side of the race is "more important" than the other.

armanbilge commented 1 year ago

I think I just ran into this in the wild. There was a bug at play (forgot to unmask the timeout in a poll) but it was confusing to diagnose because it was waiting for the timeout, even though the effect in question had already been canceled.

durban commented 1 year ago

What is the reason for the timeout logic to be duplicated in IO? I mean for example these 2:

durban commented 1 year ago

The IO#timeoutTo scaladoc has this:

If the source is uncancelable, the resulting effect will wait for it to complete before evaluating the fallback.

This seems to forbid the "biased race" (If I understood it correctly). But I don't know, why is this documented this way...

Jasper-M commented 1 year ago

That's just the normal back pressured cancelation IIUC.

armanbilge commented 1 year ago

In the original example when you race IO.canceled against IO.sleep(...) both sides are cancelable. So you can cancel both of them without waiting for it to complete.

What I mean by "biased race": indeed, because of back-pressured cancelation we must always wait for both sides to complete (either in success, error, or cancelation). However, in many cases such as timeouts my claim is that the outcome of one side is more important than who actually won the race.

For example, suppose you are timing out some uncancelable effect that returns a Foo (i.e., you are racing it against a sleep). If the timeout fires, the sleep has won the race, but we still have to wait for the uncancelable effect to finish. When it finishes, it will have a Foo. So at that point, you already have the Foo, you might as well as take that result and continue going.

durban commented 1 year ago

@armanbilge Yes, I agree with all that. It just contradicts the current scaladoc, which explicitly says that even if the timeouted effect finishes (due to uncancelability), the fallback will still be executed.

armanbilge commented 1 year ago

Ah, sorry, missed that bit. You are right then, this would be a semantic change, going by the docs, 🤔

djspiewak commented 1 year ago

Actually this all comes from the following:

            case Outcome.Canceled() =>
              poll(f.join).onCancel(f.cancel).flatMap {
                case Outcome.Succeeded(fb) => fb.map(Right(_))
                case Outcome.Errored(eb) => raiseError(eb)
                case Outcome.Canceled() => poll(canceled) *> never
              }

I kind of think that this might be wrong. Like… if the left is canceled, why wouldn't we just cancel the whole immediately?

armanbilge commented 1 year ago

Where is that from? I'm not entirely sure tbh :) for a "generic" race, if one side cancels then it still seems reasonable for the other side to win. It's just for timeouts, the sleep(...) winning is not very meaningful.

djspiewak commented 1 year ago

for a "generic" race, if one side cancels then it still seems reasonable for the other side to win

I mean, is it? In general, we usually treat the branches of race as if they're uniform with the parent fiber, which would suggest that self-cancelation should be treated as cancelation. This is certainly what would happen with external cancelation.

armanbilge commented 1 year ago

I still not so sure. It's hard to get a good feel for what self-cancellation means.

However, I did recently write some code that I think this would break. In that code, I am racing two fibers. Those fibers may self-cancel depending on their result,(essentially, to disqualify itself from the race). This also ensures that the race completes if both fibers cancel. I haven't thought about it too hard yet, but how would I express this sort of logic after your change?

djspiewak commented 1 year ago

how would I express this sort of logic after your change?

One easy route is just replacing canceled with never.

armanbilge commented 1 year ago

One easy route is just replacing canceled with never.

But if they both never, then the race never ends :)

djspiewak commented 1 year ago

That's fair… Hmm. Do you have a digestable concrete example?

armanbilge commented 1 year ago

https://github.com/armanbilge/huckle/blob/dc2550106a9fc98ca15d5fafd4a2731d57d20500/maven/src/main/scala/huckle/maven/resolver.scala#L56-L72

Don't mind the large Yak behind it :innocent:

djspiewak commented 1 year ago

Btw @durban I genuinely have no idea why it's duplicated. We should probably undo that.

Jasper-M commented 1 year ago

IMVHO it would be fine to leave race as it is and implement timeout in terms of racePair that gives you handles on the outcomes so you can embedCancel instead of embedNever.

Jasper-M commented 1 year ago

On the other hand, looking through our codebase, most of our uses of race are something like race(longRunningProcess1, longRunningProcess2). So there the expectation is effectively that if one process ends the whole thing should stop. Whether it ends in success, failure or cancelation shouldn't make a difference.

Or maybe we need to refactor to e.g. List(longRunningProcess1, longRunningProcess2).map(_ *> IO.raiseError(UnexpectedCompletionError())).parSequence.

Jasper-M commented 1 year ago

Or probably raceOutcome is the most fitting candidate for that use case.

Maybe we should just have extra permutations of race for all different uses:

djspiewak commented 1 year ago

We can reduce this down a little bit. It's effectively a philosophical question: is timeout archetypical of all races (and thus, any races which need different semantics should use racePair directly), or is timeout the unusual one?

armanbilge commented 1 year ago

I'm going to point at this PR again :)