Open armanbilge opened 1 year ago
Just to preserve some context…
There are two things here that are both independently challenging. The first thing that is challenging is the done.get()
call, which is hard to fake around. We use this to detect when the pool is shutting down, so we should murderize the worker threads. In this case, the best we can do might end up being something a bit like what we do for canceled
in the fiber.
The second thing here is the sleepers
polling. This has to be a memory barrier in the present encoding because we allow theft of timers by other workers, meaning that we need to be atomic with respect to that theft. This is a little different than how we handle it in the work queue, because the work queue is stolen from one end while we dequeue from the other, which in turn means that we can play games with the memory barriers. When we poll and steal from the same side, there's just not a lot that can be done: we have to cross a barrier to ensure atomicity.
If I may suggest the dumb thing: why don't we just check both at a lesser granularity? e.g. every 64 iterations, instead of every iteration.
If I may suggest the dumb thing: why don't we just check both at a lesser granularity? e.g. every 64 iterations, instead of every iteration.
This is definitely where we should start. It has some slightly subtle implications for done.get()
but nothing that should be problematic, I think.
When we poll and steal from the same side, there's just not a lot that can be done: we have to cross a barrier to ensure atomicity.
Here's an idea: we don't have to cross a barrier to be sure if no timers have expired. So long as the timers queue is only written by its current thread, then the time of the next expiration can be treated as a thread local. If there was possibly an expiration, then we can cross the read barrier to find out. On second thought, in practice this might not be useful, if this thread local doesn't get updated for canceled timers (and also stolen timers, but I assume that would be more rare), so it would end up crossing the read barrier all the time anyway ... 🤔
Btw, this also raises the question: do we really want to do System.nanoTime()
on every iteration? I thought this was a non-trivial op as well.
Here's an idea: we don't have to cross a barrier to be sure if no timers have expired. So long as the timers queue is only written by its current thread, then the time of the next expiration can be treated as a thread local.
This is super smart. Pair this with another assumption (which holds!): only the local thread can decrease the next expiration. When timers are stolen, it just means that the expiration increases, meaning that we would hit the threshold, spuriously check the queue (crossing a memory barrier), discover that we were wrong to wake up and then update the local threshold again.
In https://github.com/typelevel/cats-effect/issues/3445#issuecomment-1586367793 I describe how we can avoid a read-barrier related to cedeBypass
.
Spinning out of https://github.com/typelevel/cats-effect/pull/3499#discussion_r1170622173. These are the ones I'm aware of:
https://github.com/typelevel/cats-effect/blob/ab5d56c6a6cb2ef7484e5ecb881ecf6b85ff61dc/core/jvm/src/main/scala/cats/effect/unsafe/WorkerThread.scala#L419
https://github.com/typelevel/cats-effect/blob/ab5d56c6a6cb2ef7484e5ecb881ecf6b85ff61dc/core/jvm/src/main/scala/cats/effect/unsafe/WorkerThread.scala#L689-L694