Open armanbilge opened 1 year ago
I think this is a good idea. I suspect it would('ve) make fixing #3603 almost trivial. (I assume the semantics would be that if a cancel won, the callback would return false.)
Something worth thinking about is how would this affect Deferred#complete
: it already returns a bool, but that means a different thing.
To answer my own question: I don't think this should affect Deferred#complete
. That is a similar, but different thing. With a Deferred
, there is 0 or more subscribers, and it is "normal" not to wake up some of them. With async
/cont
/etc., there is one "subscriber": the fiber itself; and it is important whether we woke it up.
I've started working on this (#3917), and the desired semantics are not entirely clear.
What should resume
return (true
or false
), if resume
wins the race? get
did not run (probably yet), and resume
sets the result. This seems like it should be true
, but I think nothing stops the fiber from being cancelled right after this. So it's not really true
or false
, but "we don't know (yet)". Waiting in resume
is obviously not an option. (Also note, that get
may not even be sequenced, so it may never run.)
Any ideas?
but I think nothing stops the fiber from being cancelled right after this
Really? I would say that's a bug. If resume
completes with a value (and returns true
), then get
must resume with that value. If you can cancel between these steps, then you can lose values.
But it depends what you mean by "right after this". If you mean steps after the get
, that is fine.
Well, how about something like this:
new Cont[IO, Int, String] {
def apply[F[_]](implicit F: Cancelable[F]) = { (resume, get, lift) =>
lift(IO(resume(Right(42)))) >> F.canceled >> get.map(_.toString)
}
}
With the current implementation this self-cancels (I think this is expected). But I think it could be cancelled at any of the >>
s also. Which also seems "fine" (if a bit strange).
I think the point is, that currently when resume
executes, it may have no way of knowing what will happen with the value it writes. E.g., because get
did not run yet, or in fact it may never will run.
Ok, thanks, I see your point now.
I think the point is, that currently when
resume
executes, it may have no way of knowing what will happen with the value it writes.
I think semantically the true
means that if get
runs, then this will be the value it returns.
I'm not sure that helps with the "lost value" problem. For example, let's see asyncCheckAttempt
(async
is of course similar):
def apply[G[_]](implicit G: MonadCancel[G, Throwable]) = { (resume, get, lift) =>
G.uncancelable { poll =>
lift(k(resume)) flatMap {
case Right(a) => G.pure(a)
case Left(Some(fin)) => G.onCancel(poll(/*HERE*/get), lift(fin))
case Left(None) => get
}
}
}
Let's say, that resume
runs first, writes the result, gets a true
(if get
runs, it will return that result). But then the fiber gets cancelled right before get
is sequenced (marked with HERE
in the code above)... and the result is lost. (But whoever called resume
thinks it isn't lost.)
Arguably, this is a "bug" in asyncCheckAttempt
. (Well, not in the current version, but in the version where cb
returns a Boolean
.) I'm still thinking how to fix this bug, but I'm also not sure the bug is really in asyncCheckAttempt
...
Hmm, thanks for writing that out. Actually if I understand correctly, this seems to be a very general problem (not specific to asyncCheckAttempt
).
get
is sequencedget
must be cancelableCurrently there is no way to have a cancelable get
that is guaranteed to be sequenced 🤔 I'm not even sure if such a thing is possible, without a tryGet
and a combinator like onCancelMasked
from https://github.com/typelevel/cats-effect/issues/3474#issuecomment-1509872654 to "defer" a cancelation signal to avoid leaking values.
dumb-idea-of-the-day
poll(/*HERE*/get)
what if we just disallowed cancelation there (i.e. change the implementation of uncancelable
/poll
)?
this seems to be a very general problem (not specific to asyncCheckAttempt).
Yeah. Although, I suspect that if we can solve it for asyncCheckAttempt
, we can probably solve it for others too. The typical usages of cont
seem to be quite similar, and asyncCheckAttempt
seems to be the most general one.
Also worth noting, that this whole problem only occurs, if resume
wins the race. If get
runs first and suspends the fiber, that's fine. If the fiber is cancelled before resume
runs, I think that's also fine (there is a cancellation check in resume
).
what if we just disallowed cancelation there (i.e. change the implementation of uncancelable/poll)?
Yeah, I thought about that, but that has... consequences. If we look at this:
poll(/*1*/ x /*2*/)
The point 2
is already not a cancellation point. If we also make 1
not a cancellation point, that makes poll
a no-op in certain cases (e.g., if x
is pure
, or a single delay
, or something like that). This might not be a deal breaker, but I'm not sure.
that makes
poll
a no-op in certain cases (e.g., ifx
ispure
, or a singledelay
, or something like that). This might not be a deal breaker, but I'm not sure.
Yeah, honestly that seems fine to me 🤷 but I won't claim to foresee all the consequences :)
Let's say, that resume runs first, writes the result, gets a true (if get runs, it will return that result). But then the fiber gets cancelled right before get is sequenced (marked with HERE in the code above)... and the result is lost. (But whoever called resume thinks it isn't lost.)
Bolded the relevant bit here.
So in this scenario, whoever called resume
definitely knows the value is lost, because their finalizer is getting called. Consider this from the user side. I'll use async
instead since it's a simpler signature:
IO.async(cb => IO(/* things that call cb */ Some(IO(/* finalizer */))))
Okay so if the code at /* finalizer */
runs in the above, we know definitively that any value (or error) which had been passed to cb
is lost. At this point, we're responsible for sorting that out, either by recovering it somehow or getting an upstream to retry, etc. Whatever it takes really.
In some cases, this isn't reliably possible without sacrificing other system properties, so instead what often happens is people treat this as a nondeterminism (even though it isn't one) and dodge the recovery by instantiating async
to Unit
and solely using it as a latching mechanism (see: the new Queue
implementations, for example). This avoids having to create weird recovery protocols.
But overall, this is something that users can cleanly respond to, unlike the cancelable gap at the end of poll
regions, to which they have no response.
whoever called resume definitely knows the value is lost, because their finalizer is getting called
That is a very good point. (It would be "nicer", if the return value could give this information directly, but I'm beginning to think that might not be possible.)
Now the question is: could we build something (useful) with both the Boolean
and the finalizer, which we couldn't already build with just the finalizer? (Because if not, then this issue is not really needed.)
So, about never cancelling (observing cancellation?) right inside poll, i.e.,
poll(/* we never cancel HERE */ x)
It is possible. In my draft PR (https://github.com/typelevel/cats-effect/pull/3917), I did it (not in a very elegant way, as it's just an experiment for now). It seems to work. A few things needed fixing (notably in Resource
), but nothing too terrible.
Edit: of course, this is yet another violation of the functor laws I think. The existing one is for poll(x /* HERE */)
, so it's not surprising.
This
Boolean
should be similar to theBoolean
returned when completing aDeferred
i.e. it should indicate that this invocation of theresume
/callback
is the one and only one that actually completed the callback. Everyone else should getfalse
.This information about whether the callback succeeded would help avoid leaks in various concurrent state machines. Here are two that I encountered recently: