isocpp / CppCoreGuidelines

The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines
Other
42.56k stars 5.43k forks source link

CP guidelines for condition variables #554

Open jwakely opened 8 years ago

jwakely commented 8 years ago

Sorry this isn't complete, I'll try to flesh it out.

Condition variables are not semaphores. Notifications will be missed if they are sent when no other thread is blocked waiting on the condition variable, so you must not just rely on notify_one() or notify_all() to signal another thread, you must always have some predicate that is tested, e.g. a boolean flag (protected by the same mutex that is used when waiting on the condition variable). With a predicate that tests some condition, even if the notification is sent before the other thread waits on the condition variable then it won't be "missed" because the predicate will be true and the wait will return immediately.

Bad:

std::condition_variable cv;
std::mutex mx;

void thread1() {
  // do some work
  // ...
  // wake other thread
  cv.notify_one();
}

void thread2() {
  std::unique_lock<std::mutex> lock(mx);
  cv.wait(lock);  // might block forever
  // do work ...
}

Good:

std::condition_variable cv;
std::mutex mx;
bool ready = false;

void thread1() {
  // do some work
  // ...
  // make condition true:
  std::lock_guard<std::mutex> lock(mx);
  ready = true;
  // wake other thread
  cv.notify_one();
}

void thread2() {
  std::unique_lock<std::mutex> lock(mx);
  cv.wait(lock, []{ return ready; });
  // do work ...
}
tvaneerd commented 8 years ago

I really like this suggestion (and the mutex locking/unlocking one as well)

For example code, I try to avoid // do some work and instead use do_some_work(), because example code already uses comments for 2 things - normal code comments like // wake other thread and meta-example comments // might block forever. So adding a third type of "pretend code was here" comment just adds to the problem.

std::condition_variable cv;
std::mutex mx;

void thread1() {
     do_some_work();
     // let others know work is done
     cv.notify_one();
}

void thread2() {
     std::unique_lock<std::mutex> lock(mx);
     // wait for some_work to be done:
     cv.wait(lock);  // ** might block forever!? **
     // ...
}

Also, you might want to explicitly explain the example here with "If thread1() finishes quickly and calls notify before thread2() gets to cv.wait(), then thread2() will wait() forever (as the notification was "missed")." I know that's already what you said, but referring directly to the example helps. Might want to also add "If you think you "know" that thread1() will "never" finish first (eg because it takes so long), don't rely on that. It is a bug waiting to happen." (that wasn't well said, but something along those lines)

Also consider mentioning "spurious wakeups" which are confusing to novices, but another reason why you need the separate condition, even if you "know" thread1() will take longer than thread2.

tamaskenez commented 8 years ago

Subtle detail but release lock before notify. See std::condition_variable::notify_one / Notes, 2nd paragraph.

void thread1() {
    do_some_work();
    // let others know work is done
    {
        std::lock_guard<std::mutex> lock(mx);
        ready = true;
    }
    // wake other thread
    cv.notify_one();
}
jwakely commented 8 years ago

@tamaskenez, unlocking the mutex before notifying is an optimisation, and not essential. I intentionally didn't do that, to keep the example simple. There could be a second guideline about that point, but it's not related to the "always use a predicate" rule. I would object strongly to complicating the example by doing that.

jwakely commented 8 years ago

@tvaneerd great point about using a do_some_work() function instead of comments. There could certainly be more discussion of how the deadlock can be reached, and how spurious wake-ups can happen (and aren't a problem if you use the predicate form of wait/wait_for/wait_until).

I'll incorporate those suggestions to flesh it out further.

cubbimew commented 8 years ago

Could this incorporate or list separately my old bit from the TODO chapter of the guidelines, "atomic bool whose value is set outside of the mutex is wrong". It seems to be a recurring misconception, a recent repeat on stackoverflow: http://stackoverflow.com/a/35775651/273767 an older one http://stackoverflow.com/q/35411571/273767 and I certainly recall more

jwakely commented 8 years ago

@cubbimew yes, I think it makes sense to combine "always use a predicate to test some condition" and "always use the same mutex to wait and to make the condition true" into one guideline.

AndrewPardoe commented 8 years ago

We generally like this idea. There are at least two separate rules in here, one about avoiding spurious wakeups and one about not missing wakeups. We'd at least want this split into two separate rules.

Additionally we might want to discuss the pattern that a condition variable needs to be associated with a mutex.

Assigning this to Herb to integrate a first draft of this guidance. Thank you, Jonathan!

cubbimew commented 8 years ago

yet another question on SO asking what's wrong with using unprotected atomic bool as a condition http://stackoverflow.com/questions/38147825

cubbimew commented 7 years ago

The main part of this issue appears to be have been written as CP.42: Don't wait without a condition back in april https://github.com/isocpp/CppCoreGuidelines/commit/e8dea3807a6f01254612a39be94df659137d558e although the part about always using a mutex (and the same mutex) to set that condition is still important to note.

cubbimew commented 7 years ago

...and another StackOverflow sighting of the mutex use Why do I need to acquire a lock to modify a shared “atomic” variable before notifying condition_variable

MikeGitb commented 7 years ago

Personally I don't think CP.42 catches the essence of the rule presented by @jwakely. In particular as the bad example doesn't quite demonstrate what it is supposed to demonstrate (thanks to the loop thread 2 will most likely not block forever). Could we just add @jwakely's initial draft and then improve/exend it in future PRs?

As there are a few experts in this thread: Is there any special reason why the c++ standard library didn't get a semaphore class? Would this be something that should be added to the gsl?

cubbimew commented 7 years ago

Is there any special reason why the c++ standard library didn't get a semaphore class?

Not one of the experts you're looking for, but boost.thread FAQ from 2003 had an explanation for why they dropped the semaphore class "Semaphore was removed as too error prone"

MikeGitb commented 7 years ago

Interesting. I guess I always thought of semaphores as a specialized but simple to use signalling mechanism and not as a dangerous generalization of a mutex.

Thank you. Remains the actual topic of this issue.

cubbimew commented 7 years ago

Another Stackoverflow sighting of updating the condition without holding the mutex: Possible race condition in std::condition_variable?

jwakely commented 7 years ago

The standard committee's Concurrency & Parallelism study group are considering a proposal for binary_semaphore and counting_semaphore but they will be low-level facilities for expert use. Adding semaphores to the GSL would be a mistake IMHO.

I'll work on improving my initial draft, extending it to explain why a mutex must be used by the producer thread even if the shared data is updated atomically.

jwakely commented 7 years ago

In #925 Howard suggested:

When notifying a condition_variable, default to doing it before you unlock the mutex instead of after. Rationale: It is never wrong to unlock after the notify, and it is sometimes wrong to unlock before the notify. And there is no performance penalty in unlocking after, as good implementations take this coding pattern into account.

jeremyong commented 6 years ago

It may not be wise to be too strict about the wording to require a mutex. I think it is sufficient to say that if your code is intolerant to missed notification, you should use it. Suppose for example, you have a message pump that is designed to wakeup another thread to perform work at set intervals. Using an atomic variable, you may miss (practically speaking) at most one notification when not relying on a mutex. In some cases, the system may be designed to tolerate this for optimal performance. Alternatively, in some cases, the code waiting on the condition_variable may be tolerant to a spurious wakeup (e.g. maybe this is a worker thread that operates in a loop and waits again if there is nothing to do). In this case, the condtional predicate for a wakeup is actually superfluous. That said, I agree about warning about the pitfalls, but to be honest, these are pitfalls that come out of a lack of experience working with multithreaded code in general.

jwakely commented 6 years ago

I strongly disagree. Failing to use a mutex associated with a condition variable is one of the most common mistakes, and saying "you should probably do this, unless you know what you're doing" is bad advice. Everybody thinks they know what they're doing. Nobody knows what they're doing.

The waiters already need to use a mutex anyway, so you can't avoid using one entirely. If you want to send notifications without using a mutex, use something other than a condition variable e.g. a semaphore or a futex. Efficient semaphores are in the pipeline for standard C++.

jeremyong commented 6 years ago

Yes, but then you start to breed a cargo-cult mentality where people do something "because" without understanding the why.

To me, it's more important to teach that any two sequenced instructions should not be expected to be atomically executed unless guarded by a mutex. I'm not sure that failing to understand this particular error has anything to do with condition_variable. A better place to insert the guidelines perhaps, is the usage of atomics in situations where the user of the atomic now (optimistically) believes a mutex is no longer necessary without the proper analysis. This is the true "poison" consumed by developers who get exposed to atomics IMO.

hsutter commented 4 years ago

Editors call: In relation to #1616, @jwakely do you still want to pursue this issue?

jwakely commented 4 years ago

I'm afraid I don't have the spare cycles to work on this, sorry.