[Glibc] Condition variable lost wakeups

mratsim commented 4 years ago

So I'm in conflict:

On the red corner, formal verification, model checking, axioms, proofs and the foundation of our comprehension of our universe.
On the blue corner, glibc, an industry standard which should have a reference implementation of pthreads.

As an arbiter, OSX C standard library.

Those are example logs of unfortunate fates of my runtime:

Plenty of people suggested on stack overflow that you should use locks, you should unlock before signaling, you should unlock after signaling. However the evidence seem damning, Glibc sits on the box of the accused. OSX does not exhibit the same behaviour. Musl does though. Formal verification explored over almost 10 millions of possible state interleaving and did not find a deadlock or livelock in my event notifier:

So what does Glibc has for defence?

While waiting for a fix that may never come especially in distros that upgrade very slowly, let's review our options:

Ask everyone to switch to Mac (yeah no, ...)
Don't back off: saving power would be nice still.
Use exponential backoff, log-log-iterated backoff of some of the backoffs techniques used for Wi-fi/bluetooth explained in the backoff readme: https://github.com/mratsim/weave/blob/1a458832/weave/channels/event_notifiers_and_backoff.md: acceptable but nanosleep is not a posix standard and sleep in microsecond is too long and also "approximative". It also does not completely sleep the threads so there is still some power use remaining. And lastly it increases latency.
Try one of the wakeup ceremony mentioned in there: https://stackoverflow.com/a/9918764 (apparently used in the linux kernel)
```
/* atomic op outside of mutex, and then: */
```

pthread_mutex_lock(&m); pthread_mutex_unlock(&m);

pthread_cond_signal(&c);


- or condition signaling via mutex unlock? https://news.ycombinator.com/item?id=11894100 (but in the screenshot I did add lock everywhere
- last resort implementing my own futexes and condition variable from scratch on top of Linux, Mac and Windows primitives ...

mratsim commented 4 years ago

No comment ...

mratsim commented 4 years ago

No more condvar on Linux #58

mratsim commented 4 years ago

And for posterity, I confirm that there are no lost signals in Windows condition variables.

mratsim commented 4 years ago

Seems like I'm not alone: https://software.intel.com/en-us/forums/intel-fortran-compiler/topic/720611

mratsim commented 3 years ago

Seems like there is now an issue upstream affecting C#, Python, OCaml:

https://sourceware.org/bugzilla/show_bug.cgi?id=25847

And someone reimplemented glibc condition variable in TLA+ to prove that they were buggy:

https://probablydance.com/2020/10/31/using-tla-in-the-real-world-to-understand-a-glibc-bug/

richfelker commented 3 years ago

Someone pointed me to this issue and it was interesting to me because I was one of the original participants in the discussion around the glibc bug (https://sourceware.org/bugzilla/show_bug.cgi?id=13165) it sounds like, and apparently there's allegedly a new similar glibc bug. However, I'm pretty sure that was not the issue, and that the problem here was a not a glibc bug. Observing any bug of that form (stolen signals), requires the signal to take place with the lock held; otherwise, there is no ordering relationship between the signal and newly arriving waiters that might "steal" the signal, and thus no way to designate it stolen. In the code shown here, the signal takes place without the lock held.

I'm not sure if the new futex code you have is correct or not; with backoff and retry it probably works automatically though. But the cond var code is almost certainly wrong. If you need to know that signal will wake a waiter that arrived before the event you're signaling, the change to state and signal operation must take place together under a single instance of the lock being held.

richfelker commented 3 years ago

As an extra data point, the new glibc bug https://sourceware.org/bugzilla/show_bug.cgi?id=25847 is not reproducible on musl (where the current condvar implementation, dating back to 2014, was specifically designed around getting the stolen wakes issue right, as well as self-synchronized destruction right), but as I understand it, you hit the problem described above with both glibc and musl.

hmkemppainen commented 3 years ago

I'm that someone. I was at work at the time and didn't really have the time to study the code or the TLA+ model. I just sort of assumed you know what you're doing and dropped a link to this bug on IRC for the musl folk to check out.

I now took a look at the PlusCal code and it is clear that it's not simulating a condition variable correctly. Signal should be a no-op if nobody is waiting, but in your spec it simply sets a boolean variable which will eventually awaken a waiter whether they were waiting at the time of signal or not.

Try something like this and TLC will find a deadlock:

diff --git a/formal_verification/event_notifiers.tla b/formal_verification/event_notifiers.tla
index aaae45f..0ff4bd8 100644
--- a/formal_verification/event_notifiers.tla
+++ b/formal_verification/event_notifiers.tla
@@ -36,6 +36,7 @@ variables
     ticket = FALSE;                        \* a ticket to sleep in a phase
     signaled = FALSE;                      \* a flag for incoming condvar signal

+    condWait = 0;
     condVar = FALSE;                       \* Simulate a condition variable
     msgToParent = "None";                  \* Simulate a worksharing request to the parent
     signaledTerminate = FALSE;             \* Simulate a termination task
@@ -88,8 +89,10 @@ procedure park()
     begin
         NotSignaled2:  if ~signaled then
         StillValid:      if ticket = phase then
+        StartWait:         condWait := condWait + 1;
         Wait:              await condVar;    \* next line is atomic
                            condVar := FALSE; \* we don't model the spurious wakeups here
+                           condWait := condWait - 1;
                          end if;             \* but they are not a problem anyway
                        end if;
         Reset:         signaled := FALSE;
@@ -104,7 +107,9 @@ procedure notify()
                        end if;
         Notify:        signaled := TRUE;
         InvalidTicket: phase := ~phase;
-        Awaken:        condVar := TRUE;
+        Awaken:        if condWait > 0 then
+                          condVar := TRUE;
+                       end if;
         RET_Notify:    return;
 end procedure;

As @richfelker said, you really should use that lock. It's not there just for show, and you can't dance around it using atomics.

mratsim commented 3 years ago

Thank you for the TLA condvar

As @richfelker said, you really should use that lock. It's not there just for show, and you can't dance around it using atomics.

I'm not sure what you mean by dance around with atomics, this also deadlocks despite using the lock

hmkemppainen commented 3 years ago

I'm not sure what you mean by dance around with atomics, this also deadclocks despite using the lock

I mean that the waiter must both check the state variable and call wait while while holding the lock. If you don't, there's always a race between checking the state variable and entering wait (no matter how much you use atomics and fences). Of course the signaller must modify that state variable only while holding the lock.

If you think about the condvar api, it wouldn't make any sense for them to require you to just wrap the wait/signal calls with a lock. If that is all they need from you, then they could very well grab the lock inside the library for you and you wouldn't ever need to worry about it. But simply wrapping the call is not enough, you must also protect the controlling state variable with a lock. That is why it's your responsibility to use the lock.

I don't know if that explanation makes sense but consider this pseudocode:

park() {
  lock.acquire();
  if (shouldWakeUp) {
    shouldWakeUp = FALSE;
    lock.release();
    return;
  }
  // if we didn't have the lock here, shouldWakeUp could change to TRUE right now
  // and the notifying thread could signal before we enter wait().  That signal is effectively
  // lost.
  cond.wait(lock);
  lock.release();
}

notify() {
  lock.acquire();
  shouldWakeUp = TRUE;
  cond.signal(); // it should be safe to signal after the release too
  lock.release();
}

It is critical that the condition for waking up is checked while the lock is held, before entering wait. That's what the lock is there for.

mratsim commented 3 years ago

The whole point of my "2-phase commit to sleep" protocol is to avoid holding the lock for a long time.

There is work to do in between the shouldWakeUp check and cond.wait(lock) that requires sending messages to other threads. Your proposal might deadlock my runtime if threads are locking each of their event notifiers.

richfelker commented 3 years ago

You still must check the predicate again with the lock held before waiting on the condvar, and there must be an invariant that other threads cannot modify the data the predicate depends on without holding the lock.

You don't have to "hold the lock for a long time". You should generally do nothing but take the lock, make sure nothing has changed that might require re-inspection, and then perform the condvar wait. Long-running operations can be done outside that.

mratsim commented 3 years ago

Ah I see what you mean.

In the current code this is ensured by the phase and ticket system: https://github.com/mratsim/weave/blob/a230cce98a8524b2680011e496ec17de3c1039f2/weave/cross_thread_com/event_notifiers.nim#L67-L94

https://github.com/mratsim/weave/blob/a230cce98a8524b2680011e496ec17de3c1039f2/weave/cross_thread_com/event_notifiers.nim#L114-L134

This works fine with the Futex code. I wonder why I didn't get any deadlocks with Windows or Mac condition variables. In particular Windows was using the same machine so it's possible that their signaling works on different assumption than POSIX.

mratsim / weave

[Glibc] Condition variable lost wakeups #56