What prevents reordering between wait and the decrementing of num_waiters?

The content that the question is about

https://marabos.nl/atomics/building-locks.html#avoiding-syscalls (p. 199 of the second printed release)

The question

Hi Mara,

I'm struggling to understand one use of the Relaxed memory ordering in chapter 9. Hopefully you can point me in the right direction?

In Condition Variable: Avoiding Syscalls, what prevents reordering between the call to wait and the decrementing of num_waiters, from the point of view of the notifying thread?

    pub fn wait<'a, T>(&self, guard: MutexGuard<'a, T>) -> MutexGuard<'a, T> {
        self.num_waiters.fetch_add(1, Relaxed); // New!

        let counter_value = self.counter.load(Relaxed);

        let mutex = guard.mutex;
        drop(guard);

        wait(&self.counter, counter_value);

        self.num_waiters.fetch_sub(1, Relaxed); // New!

        mutex.lock()
    }

The text includes the following paragraph, which makes sense (to me) for the waiting thread:

We also don’t need to worry about the notifying thread observing the decremented value "too soon," because once the decrementing operation is executed, perhaps after a spurious wake-up, the waiting thread no longer needs to be woken up anyway.

But, unlike the happens-before relationship between incrementing num_waiters and unlocking the mutex, it isn't clear to me what ensures that the notifying thread must observe num_waiters being decremented only after (or, more importantly, if) the waiting thread has/will wake up anyway.

For example, couldn't the compiler reason that it's fine to decrement num_waiters before calling wait? (Apart from it probably not being able to see much into wait and, thus, being constrained to a conservative approach).

Thanks!

P.S. I really enjoyed the book! it's very precise but also super approachable. Differently from most resources on the subject, you encourage understanding and leveraging of weaker memory orderings, and I found that really refreshing and stimulating.

For example, couldn't the compiler reason that it's fine to decrement num_waiters before calling wait?

Consider this example, without any atomics or concurrency:

fn main() {
    println!("Hit enter to delete all your files...");
    std::io::stdin().lines().next(); // Wait for a line of input, assuming stdin is a terminal.
    delete_all_your_files();
}

Imagine if delete_all_your_files() could have effects (outside this thread/program) before waiting for input. That would be unacceptable, right?

The same is true for the wait() and num_waiters situation: the decrement of num_waiters is observable by other threads and shows that wait() has returned. So, if it isn't even known yet whether wait() will return or not, num_waiters cannot be (observed as) decremented yet.

(It is possible that the decrement gets reordered with the last few instructions (e.g. the return instruction) of wait(), while wait() is already returning. But that's fine, because then wait() isn't sleeping anymore and already in the process of returning, so everything will continue just fine without needing to observe another wake() call.)

P.S. I really enjoyed the book! it's very precise but also super approachable. Differently from most resources on the subject, you encourage understanding and leveraging of weaker memory orderings, and I found that really refreshing and stimulating.

Thank you! I'm very glad to hear that. It was a bit of a experimental/risky approach to put relaxed ordering in front, rather than centering sequentially consistent ordering like most other resources, so I'm glad to hear that worked out well. ^^

So, if it isn't even known yet whether wait() will return or not, num_waiters cannot be (observed as) decremented yet.

To go a bit deeper into that:

What if the compiler somehow knows exactly what the wait() function does? What if it knows that the implementation that we happen to use is guaranteed to always return, let's say because it is guaranteed to return spuriously every hour. Would the compiler then be allowed to decrement num_waiters way too early, such that our condition variable waits for up to an hour longer than necessary?

The answer is, strictly speaking, yes. But that's simply because the theoretical model does not say anything about timing. Whether something takes a microsecond or an hour or a year doesn't matter, the model only cares about the order in which things happen (or are observable).

This also affects other, very common, situations:

let guard = mutex.lock().unwrap();
do_something();
drop(guard); // unlock
std::thread::sleep(Duration::from_secs(3600)); // sleep an hour

The theoretical model only guarantees that do_something() happens while the mutex is locked, but it technically does not guarantee that the sleep() happens while it's unlocked. It's strictly speaking permissible for the compiler to keep the mutex locked for an hour longer than necessary. That would be bad.

Luckily, no compilers do that. And we can all agree that if a compiler does do such a thing, it's a compiler bug that needs to be fixed. (This is one of the reasons compilers are more careful with optimizations around atomics than necessary by the theoretical model.)

So, the conclusion is that we can only use the theoretical model for proving soundness. Having a Condvar::wait function that sleeps for way too long is bad, but sound. (The same holds for all the examples of chapter 2: for example, the progress counter being observed as 100% before any work has actually been done is actually not unsound, but definitely not what we want.) For anything about timing, we need to use the more practical understanding that memory ordering is only relevant at the micro level, not at the macro level.

From https://marabos.nl/atomics/memory-ordering.html#common-misconceptions:

The truth is that the memory model doesn’t say anything about timing at all. It only defines in which order certain things happen; not how long you might have to wait for them. A hypothetical computer in which it takes years to get data from one thread to another is quite unusable, but can perfectly satisfy the memory model.

In real life, memory ordering is about things like reordering instructions, which usually happen at nanosecond scale.

Thanks, Mara, that perfectly cleared up both aspects I was confused with.

Luckily, no compilers do that. And we can all agree that if a compiler does do such a thing, it's a compiler bug that needs to be fixed.

That's reassuring. Honestly, JF Bastien's N4455 ("No Sane Compiler Would Optimize Atomics") paper, and corresponding CppCon 2016 talk (which mentions the potential progress counter issue), had me worried to the point of loosing sight of the practical side.

Sorry it took me a few weeks to reply, I was on the road. Thanks again!

m-ou-se / rust-atomics-and-locks

What prevents reordering between wait and the decrementing of num_waiters? #38

The content that the question is about

The question