typelevel / cats-effect

The pure asynchronous runtime for Scala
https://typelevel.org/cats-effect/
Apache License 2.0
2.01k stars 515 forks source link

Experiment with starvation detecter #2762

Closed djspiewak closed 1 year ago

djspiewak commented 2 years ago

Idea shamelessly stolen from Akka. Starting a background fiber in IOApp which does something like this would be an interesting thing to explore:

val checkInterval = 1.second
val initialDelay = 10.seconds
val threshold = 100.millis

val checker = IO.monotonic flatMap { now =>
  IO.sleep(checkInterval) >> IO.monotonic.map(_ - now) flatMap { delta =>
    if (delta >= threshold)
      Console[IO].errorln("[WARNING] you're probably starving")
    else
      IO.unit
  }
}

checker.foreverM

Coefficients should be tunable.

The idea here is pretty simple: just try to wake up and run an action once per second and make sure that the drift on this execution is within acceptable bounds. If it takes too long for the runtime to pick up the action, it strongly suggests that starvation is in play. Note that this test will actually get less accurate when we redo the timer backend due to the fact that we won't be going through the external queue in the above.

ronanM commented 1 year ago

With this code

for{
  a <- db.read()
  b = computeWithLotOfCpuAndMemory(a)
  _ <- db.write(b)
} yield ()

I always have two questions:

  1. What is the acceptable limit for computeWithLotOfCpuAndMemory() ?
  2. When the limit is exceeded which operator use (delay(), blocking(), new one compute()) ?
djspiewak commented 1 year ago

@ronanM Fundamentally, there is no easy way to answer that question. Unfortunately. If there were, then we would make a combinator which does it for you!

The reason there's no easy approach here mostly stems from the fact that it's highly dependent on your scenario. The shortest possible answer to your question is to first try to break computeWithLotOfCpuAndMemory(a) into multiple steps, each of which wrapped in IO, such that each one is a little more lightweight. The longer answer is to insert IO.cede around your computation, which can by itself significantly ameliorate the problems. For example:

for {
  a <- db.read()
  _ <- IO.cede
  b = computeWithLotOfCpuAndMemory(a)
  _ <- IO.cede
  _ <- db.write(b)
} yield ()

You can find more discussion of this technique (and some rules of thumb) on the scaladoc for cede.

That alone might not be sufficient though, because you can still create a situation where you have some very CPU-heavy action which begins to starve the compute pool of resources, preventing it from responding to new incoming requests. The reason this doesn't have an easy answer is because, ultimately, the problem is one of resource oversubscription. CPU is a physical resource: you can't just conjure more of it by calling new on something. The only option is to more carefully govern your use of the limited resources that you do have.

The classical way of doing this is with a Semphore. If you wrap a Semaphore acquisition around the computeWithLotOfCpuAndMemory(a), you can ensure that at most up to n of them may run simultaneously, ensuring that the remainder of the compute pool is unaffected. A good rule of thumb here is to try to size the semaphore to at most half of your worker threads, which should in turn be sized to exactly the number of physical threads (usually CPUs). A more sophisticated way of doing this same thing is a work Queue with a background fiber which simply grabs work items and does them sequentially, similar to how an actor would behave. If you're going to those extents though, you might be better off just using Fs2. :-)

Hopefully that helps! Ultimately, large compute-bound work units are difficult to deal with for physical reasons that can't really be resolved within userspace, so you always need to take stock of your situation and apply the appropriate techniques and tuning. There's no way for the underlying runtime (Cats Effect) to resolve the issue automatically.

He-Pin commented 1 year ago

Just use fs2 and mapAsync I think.

ronanM commented 1 year ago

The shortest possible answer to your question is to first try to break computeWithLotOfCpuAndMemory(a) into multiple steps, each of which wrapped in IO, such that each one is a little more lightweight.

:+1: So, as a rule of thumb, can I say that :point_down: ?