typelevel / cats-effect

The pure asynchronous runtime for Scala
https://typelevel.org/cats-effect/
Apache License 2.0
2.04k stars 522 forks source link

TestControl + Dispatcher == deadlock #4104

Open Baccata opened 4 months ago

Baccata commented 4 months ago

Using CE 3.5.4, invoking a Dispatcher#unsafeRunSync in a program run with TestControl appears to deadlock

//> using dep "org.typelevel::cats-effect-testkit:3.5.4"

import cats.effect.IOApp
import cats.effect.IO
import cats.effect.std.Dispatcher
import scala.concurrent.duration._
import cats.effect.testkit.TestControl

object Main extends IOApp.Simple {

  // exact same thing happens using parallel 
  val program = Dispatcher.sequential[IO].use { dispatcher =>
    IO(dispatcher.unsafeRunSync(IO(1)))
  }

  def run: IO[Unit] =
    TestControl
      .executeEmbed(program)
      .flatMap(IO.println)
      .timeout(2.seconds)  // timeout doesn't kick-in 

}
djspiewak commented 4 months ago

I'm pretty sure this is expected behavior. TestControl is a single-threaded executor, so unsafeRunSync in all its forms will deadlock.

Baccata commented 4 months ago

Damn, forgot about that.

By any chance, is there anything that could help mitigate the issue, to help other poor souls understand that it's a no-no without spending a few hours debugging ? Could the Dispatcher somehow detect that it's being used in an improper runtime and throw a descriptive error or something ?

djspiewak commented 4 months ago

First thing that comes to mind is leveraging BlockContext, since unsafeRunSync uses that. It would basically mean that any time we detect a scala.concurrent.blocking, we would immediately error out and fail the test (something like an UnsupportedOperationException feels right).

Baccata commented 4 months ago

Can you elaborate ?

As of now, printing BlockingContext.current in a program that runs on TestControl prints the id of the WorkerThread from the outer runtime. So assuming you're suggesting to momentarily set the BlockingContext to some ad-hoc instance in TestControl, I'm somewhat concerned about the lifecycle, as scala.concurrent.blocking invokes a ThreadLocal.

Then there's the question of whether you're suggesting that an ad-hoc BlockingContext would always throw (which would probably break legitimate uses in TestControl), or whether you're suggesting that the dispatcher implementation would have to inspect BlockingContext.current and decide to throw if it detects the ad-hoc one (which feels somewhat off, separation-of-concern wise).

armanbilge commented 4 months ago

I'm somewhat concerned about the lifecycle

TestControl can use BlockContext's existing lifecycle-managing API:

BlockContext.withBlockContext(myContext) {
  // then this block runs with myContext as the handler
  // for scala.concurrent.blocking
}

Then there's the question of whether you're suggesting that an ad-hoc BlockingContext would always throw (which would probably break legitimate uses in TestControl)

I am concerned about this too, but I am not entirely convinced it would break legitimate use-cases. TestControl is really only useful for testing Temporal-based code. Once you get into the territory that requires blocking we are talking about I/O and multi-threading which TestControl cannot and should not be used to test. We already issue a similar warning: https://github.com/typelevel/cats-effect/blob/d5a3d4ce31d6e76ec8c5461ab4179a5ed9cdd6a4/testkit/shared/src/main/scala/cats/effect/testkit/TestControl.scala#L432-L437

armanbilge commented 4 months ago

we would immediately error out and fail the test (something like an UnsupportedOperationException feels right).

Another option is we could try and side-channel this warning, but proceed.

Baccata commented 4 months ago

To elaborate on the context : I came across a deadlock whilst testing code that was integrating with a Caffeine-cache. I wanted to test the TTLs, and caffeine has a mechanism to override the clock it uses to poll time, which is basically a () => Long. I figured I'd wire a dispatcher-based implementation to ensure the time that was received by the cache was consistent with the TestControl's.

So yeah, I don't disagree with what you say, but it does feel like my usecase was somewhat legitimate in the context of TestControl.

It's probably a XY-problem though. Maybe there could be a flavour of execute that could take a (() => FiniteDuration) => IO[A], allowing for the injection of a time-polling thunk into the impure code that needs it, but it feels a bit too bespoke ...

armanbilge commented 4 months ago

I figured I'd wire a dispatcher-based implementation to ensure the time that was received by the cache was consistent with the TestControl's.

Oh yeah ... if I understand correctly, it would be far better to just use Scheduler directly. I regret that we don't currently expose it in the calculus like we do for Async#executionContext but you could grab that and pattern-match or something ... that would work for WSTP which now extends Scheduler. We'd have to change the TestControl EC to do that as well ...

djspiewak commented 4 months ago

IO.realTime.syncStep.unsafeRunSync().toOption.get is a nice and concise™ solution that doesn't require ugly pattern matching stuff.

armanbilge commented 4 months ago

IO.realTime.syncStep.unsafeRunSync().toOption.get

Actually this doesn't work 😝 if you convert to SyncIO then you are not running on the IORuntime, which means you are not using the scheduler in that runtime.

armanbilge commented 4 months ago

In fact, since this can be surprising, I wonder if we should not support translation of realTime and monotonic in the syncStep interpreter 🤔

https://github.com/typelevel/cats-effect/blob/d5a3d4ce31d6e76ec8c5461ab4179a5ed9cdd6a4/core/shared/src/main/scala/cats/effect/IO.scala#L2286-L2287

djspiewak commented 4 months ago

Ooooooh I forgot that those were cases in the ADT inside of SyncIO. Wow that's actually super annoying.

armanbilge commented 3 months ago

First thing that comes to mind is leveraging BlockContext, since unsafeRunSync uses that. It would basically mean that any time we detect a scala.concurrent.blocking, we would immediately error out and fail the test (something like an UnsupportedOperationException feels right).

I took a closer look at this and I'm afraid this idea would be too much of a footgun.

Currently TestControl implements IO.blocking via scala.concurrent.blocking:

https://github.com/typelevel/cats-effect/blob/caee0e86f815b641af61f19becd4eac27a769f11/testkit/shared/src/main/scala/cats/effect/testkit/TestControl.scala#L370-L373

https://github.com/typelevel/cats-effect/blob/caee0e86f815b641af61f19becd4eac27a769f11/kernel-testkit/shared/src/main/scala/cats/effect/kernel/testkit/TestContext.scala#L234-L244

So this would mean that if anyone attempts to use IO.println(...) in executeEmbed (e.g. for debugging) then it would crash. A workaround would be to do IO(println(...)) ...