Closed parsonsmatt closed 1 year ago
The strategy of picking a stripe based on the capability the thread executes on works well only if the number of stripes divides the number of capabilities, for example:
Otherwise, for example if there are 2 stripes and 3 capabilities, then capability 0 and 2 will use stripe 0 and capability 1 will use stripe 1, so stripe 0 will experience a 100% more load than stripe 1.
The strategy of picking the local pool by the thread id will work better if the application continuously spawns new threads (like a web server) since their thread ids should more or less equally distribute over the stripes.
The only situation in which it's worse is if your application spawns 3 worker threads and they get ids let's say 1, 3, 5. Then all of them will use stripe 1 and stripe 0 doesn't get used.
The situation in which there are more stripes than capabilities IMO doesn't make sense, so I don't have a good answer of what to do then, but selecting by capability is then also a viable option.
In summary, getLocalPool
can be adjusted to select based on ThreadId only if stripes < capabilities
and rem capabilities stripes /= 0
. Sounds good?
Sounds good! I incorporated that change into my PR.
Hmm.
Is it ever really the case that we want to block because a stripe doesn't have resources, if another stripe in the pool does have them? Clearly, a thread-local stripe is the best choice. But if we can't have a thread local stripe, then probably it's better to find a different stripe that does have available resources.
The only situation in which it's worse is if your application spawns 3 worker threads and they get ids let's say 1, 3, 5. Then all of them will use stripe 1 and stripe 0 doesn't get used.
I can easily imagine a scenario where these threads manage to use all resources on the one stripe and run into resource contention. However, the Pool
could have a max resources well above what the three are using.
Is it ever really the case that we want to block because a stripe doesn't have resources, if another stripe in the pool does have them?
Eh, yeah, that's a good point.
I came back here (sorry that I missed your input from previous month btw) to add support for explicit amount of stripes because I wanted to experiment with 1 stripe with >1 capabilities specifically because it sometimes happens that you run out of resources in one stripe, but other stripes still have them.
To me that's the only reason to reduce the number of stripes, so if pool
, instead of blocking on empty stripe would consider other stripes, having configurable number of stripes would be redundant (at least for me).
if
pool
, instead of blocking on empty stripe would consider other stripes
After some experiments I think this won't work, because considering other stripes conflicts with other threads and the whole thing slows down as if there was just one stripe :joy_cat:
Fixed by 894e195da4604783248a404464a1dfcad5e93236.
Code in PR, #17
Hashing a
ThreadId
only gets the underlyingInt
, sincehash @Int
is the identity function.If I have 2 pools and 4 capabilities, and I fork two threads on each capability, then there's no guarantee that the threads are reading from the same capability. Suppose I do:
Since we're indexing based on the
ThreadId
, and not the capability, I think this would cause threads on the same capability to be accessing pools on other capabilities.This isn't limited to
forkOn
. A thread forked withforkIO
may have a different capability every time it callsgetLocalPool
, and would receive a differentLocalPool
under this implementation.Since the prior implementation only cared about the capability, a given capability would always be using the same
LocalPool
.The problem with the implementation in my PR #15 is that you could specify a greater number of pools than capabilities. Then, the
getLocalPool
function would never index into the pools with an index greater than the number of capabilities.The solution is to use the
threadCapability
logic ifmNumStripes <= Just numCapabilities
. Then you can configure a number of pools less than number of capabilities and still get good thread performance and reduced contention.If you have more stripes than capabilities, though, it would be nice to preserve the "capability -> thread" mapping, while also using all available pools. I'm not sure in which cases you'd want more stripes than capabilities, though. I guess you could allocate a given number of stripes to some capabilities, and then some capabilities would have more resources than others... or a number of "capability-local" stripes, and those are full, then a "non-assigned" stripe can serve the thread.