lemmy / BlockingQueue

Tutorial "Weeks of debugging can save you hours of TLA+". Each git commit introduces a new concept => check the git history!
https://youtu.be/wjsI0lTSjIo
MIT License
485 stars 21 forks source link

Study the performance penalty of notifyAll vs. notifyOther #8

Closed lemmy closed 2 years ago

lemmy commented 2 years ago

Do not merge! This is work-in-progress.

lemmy commented 2 years ago

glibc implements broadcast via multiple signal calls: https://code.woboq.org/userspace/glibc/nptl/pthread_cond_broadcast.c.html

macos is less transparent (case distinction in signal with delegation to sys-level calls __psynch_cvsignal and __psynch_cvbroad: https://opensource.apple.com/source/libpthread/libpthread-218.1.3/src/pthread_cond.c

There is no explanation why pthread_cond_broadcast was ~10% faster in a macro-benchmark (several hours) when pitted against pthread_cond_signal on two Apple Silicon m1:

markus@ impl % ./producer_consumer-notifyOther 4 4 4
Buffer size = 4, # Producers = 4, # Consumers = 4
[...]

705000001 values consumed by 1
^C
markus@ impl % ./producer_consumer-notifyAll 4 4 4
Buffer size = 4, # Producers = 4, # Consumers = 4
[...]

780000001 values consumed by 3
^C

-----

markus@ impl % ./producer_consumer-notifyAll 4 4 4
Buffer size = 4, # Producers = 4, # Consumers = 4
[...]

1013000001 values consumed by 2
^C
markus@ impl % ./producer_consumer-notifyOther 4 4 4
Buffer size = 4, # Producers = 4, # Consumers = 4
[...]

996000001 values consumed by 3
^C

-----

markus@ impl % ./producer_consumer-notifyOther 4 8 8 
Buffer size = 4, # Producers = 8, # Consumers = 8
[...]

444000001 values consumed by 4
^C

markus@ impl % ./producer_consumer-notifyAll 4 8 8
Buffer size = 4, # Producers = 8, # Consumers = 8
[...]

452000001 values consumed by 5
^C

Java's Condition#signal (notify) and Condition#signalAll (notifyAll) delegate to the same piece of code that, in the case of signalAll iterates over all waiters whereas signal iterates exactly once. Thus, signal should be more efficient. In practice, however, only a subset of the consumers and producers wait on the conditionals. Most of the time, threads will wait to acquire the outer lock (mutex). It is not the case that the lock and its conditionals share the same waitset. Instead, the lock and each conditional have separate waitsets.