Open winspool opened 1 month ago
Some links to Futex and CPU cache related documents:
Jens Gustedt "Futex based locks for C11’s generic atomics"
Ulrich Drepper's paper "Futexes Are Tricky" Ulrich Drepper's paper "What Every Programmer Should Know About Memory"
The pthread mutex code I implemented is a bit of a mess mostly because I did run into some issues with futexes. The issue isn't using futexes, it has something to do with the implementation of semaphores/futexes in OW. I remember it being a problem when I committed the code, but I don't remember the specifics. It should "work," but there is a time-consuming workaround if I'm not mistaken.
I saw an article about the speed of pthread_mutex implementations ( https://justine.lol/mutex/ ) and tried the mentioned test program (high contended scenario) with OpenWatcom. The test program uses one counter, 30 threads and 100000 iterations per thread.
Yes, that is just a test program, that makes an implementation detail visible, but with the default thread count, the OpenWatcom compiled programm uses 100% cpu, until i canceled it after ~5 minutes.
The test was created to compare the mutex implementation of the cosmopolitan libc (64 bit) and a part of the mutex implementation is based on
nsync
: https://github.com/google/nsync (nsync supports many more cpu`s, including x86, alpha, mips, ppc, ...)When using all Hardware threads (12 on my system), the mutex implementation of cosmopolitan (64bit) is more than twice as fast as glibc (64bit), and the mutex in cosmopolitan is more than 100 times faster as the mutex in OpenWatcom (32bit). OpenWatcom is still about 35 times slower as glibc (32bit)
When the program uses more threads as available in hardware, the slowdown factor for the OpenWatcom compiled program increases dramatically to about ~ 7400 compared to cosmopolitan(64bit) ~ 2500 compared to glibc(32bit)
Even when using only 4 threads, OpenWatcom is ~59 times slower than glibc(32bit) With only one or two threads, OpenWatcom is >100 times slower as glibc(32bit)
I also retested with 25 threads:
########
The worker code for the thread is simple, but really slow, when compiled with OpenWatcom.
The Test program was adapted to work with OpenWatcom (rusage is not available): pthread_mutex.c.txt
The implementation of
pthread_mutex_lock
is inptmutex.c
The Author mentioned at the head of the file is J. Armstrong. Is he still around?pthread_mutex_lock
usessem_wait
andsem_wait
uses thefutex
syscall.Is there something that can be changed, to handle some sync cases faster?
Is it possible to use some code or logic from other
pthread_mutex_*
or other semaphore implementations?