mratsim / weave

A state-of-the-art multithreading runtime: message-passing based, fast, scalable, ultra-low overhead
Other
541 stars 21 forks source link

partial syncScope livelock #121

Closed mratsim closed 4 years ago

mratsim commented 4 years ago

Try to address #119

From the previous state machine syncScope

And some traces of a stuck process: image image image

What seem to happen is:

Analysis (barring other bugs)

Conclusion and fix

2 solutions are possible:

  1. Drain the whole task queue before switching to SB_Steal
  2. Or in SB_Steal, don't only answer steal requests but also work sharing requests from idle workers
mratsim commented 4 years ago

We use solution 2.

No impact on overhead as measured by fibonacci with lazy flowvars (to not measure memory overhead) under 200ms image

And with normal Flowvar under 400ms image

load distribution seems to be the same.

What may have changed is that on sync and syncScope in the steal phase the worker sends its non-direct child tasks first to its children which may be sleeping instead of its thief. If the task was short we could have saved energy by only sending to the thief. Inversely, the load distribution might be better since we give the runtime the opportunity to wake up sleeping threads as otherwise sleeping threads are only woken up on a successful theft even though the current workers might have extra tasks. I.e. the change is more greedy and so more asymptotically optimal.

mratsim commented 4 years ago

Unfortunately this is not fully fixed: https://travis-ci.com/github/mratsim/weave/jobs/323526172#L1854

mratsim commented 4 years ago

After trying to mix both solutions we still have the bug (now rarer) 2020-04-26_21-39

2020-04-26_21-38 2020-04-26_21-38_1

mratsim commented 4 years ago

There is a more problematic root cause, merging for now as it still helps a lot.

mratsim commented 4 years ago

Perf note when trying to use both

The FSM seems to have slightly higher overhead. 20ms slower on Fib(40) on both eager (387ms runtime) and lazy flowvars (204 ms runtime).

This is acceptable for now but hopefully we find the real bug.