New `getLocalPool` wrecks the capability-local aspect for configured pool count

parsonsmatt commented 1 year ago

Hashing a ThreadId only gets the underlying Int, since hash @Int is the identity function.

If I have 2 pools and 4 capabilities, and I fork two threads on each capability, then there's no guarantee that the threads are reading from the same capability. Suppose I do:

foo i pool = do
  forkOn i $ do
    withResource pool $ \item ->
      pure ()
  forkOn i $ do
    withResource pool $ \item ->
      pure ()

dang pool = do
  caps <- getNumCapabilities
  for_ [1.. caps] $ \i -> do
    foo i pool

Since we're indexing based on the ThreadId, and not the capability, I think this would cause threads on the same capability to be accessing pools on other capabilities.

This isn't limited to forkOn. A thread forked with forkIO may have a different capability every time it calls getLocalPool, and would receive a different LocalPool under this implementation.

Since the prior implementation only cared about the capability, a given capability would always be using the same LocalPool.

The problem with the implementation in my PR #15 is that you could specify a greater number of pools than capabilities. Then, the getLocalPool function would never index into the pools with an index greater than the number of capabilities.

The solution is to use the threadCapability logic if mNumStripes <= Just numCapabilities. Then you can configure a number of pools less than number of capabilities and still get good thread performance and reduced contention.

If you have more stripes than capabilities, though, it would be nice to preserve the "capability -> thread" mapping, while also using all available pools. I'm not sure in which cases you'd want more stripes than capabilities, though. I guess you could allocate a given number of stripes to some capabilities, and then some capabilities would have more resources than others... or a number of "capability-local" stripes, and those are full, then a "non-assigned" stripe can serve the thread.

arybczak commented 1 year ago

The strategy of picking a stripe based on the capability the thread executes on works well only if the number of stripes divides the number of capabilities, for example:

If there are 4 stripes and 4 capabilities, it's perfect.
If there are 2 stripes and 4 capabilities, capability 0 and 2 will use stripe 0 and capability 1 and 3 will use stripe 1, also good.

Otherwise, for example if there are 2 stripes and 3 capabilities, then capability 0 and 2 will use stripe 0 and capability 1 will use stripe 1, so stripe 0 will experience a 100% more load than stripe 1.

The strategy of picking the local pool by the thread id will work better if the application continuously spawns new threads (like a web server) since their thread ids should more or less equally distribute over the stripes.

The only situation in which it's worse is if your application spawns 3 worker threads and they get ids let's say 1, 3, 5. Then all of them will use stripe 1 and stripe 0 doesn't get used.

The situation in which there are more stripes than capabilities IMO doesn't make sense, so I don't have a good answer of what to do then, but selecting by capability is then also a viable option.

In summary, getLocalPool can be adjusted to select based on ThreadId only if stripes < capabilities and rem capabilities stripes /= 0. Sounds good?

parsonsmatt commented 1 year ago

Sounds good! I incorporated that change into my PR.

Hmm.

Is it ever really the case that we want to block because a stripe doesn't have resources, if another stripe in the pool does have them? Clearly, a thread-local stripe is the best choice. But if we can't have a thread local stripe, then probably it's better to find a different stripe that does have available resources.

The only situation in which it's worse is if your application spawns 3 worker threads and they get ids let's say 1, 3, 5. Then all of them will use stripe 1 and stripe 0 doesn't get used.

I can easily imagine a scenario where these threads manage to use all resources on the one stripe and run into resource contention. However, the Pool could have a max resources well above what the three are using.

arybczak commented 1 year ago

Is it ever really the case that we want to block because a stripe doesn't have resources, if another stripe in the pool does have them?

Eh, yeah, that's a good point.

I came back here (sorry that I missed your input from previous month btw) to add support for explicit amount of stripes because I wanted to experiment with 1 stripe with >1 capabilities specifically because it sometimes happens that you run out of resources in one stripe, but other stripes still have them.

To me that's the only reason to reduce the number of stripes, so if pool, instead of blocking on empty stripe would consider other stripes, having configurable number of stripes would be redundant (at least for me).

arybczak commented 1 year ago

if pool, instead of blocking on empty stripe would consider other stripes

After some experiments I think this won't work, because considering other stripes conflicts with other threads and the whole thing slows down as if there was just one stripe :joy_cat:

arybczak commented 1 year ago

Fixed by 894e195da4604783248a404464a1dfcad5e93236.

scrive / pool

New `getLocalPool` wrecks the capability-local aspect for configured pool count #18