low latency and rwlocks

nfries88 commented 1 year ago

this is a feature request for low latency (adaptive, non-recursive) and non-recursive reader-writer locks.

I don't know about other backends, but pthreads should almost always have rwlocks and Windows has had SRWLock since Vista (I see you're optionally using that now).

Most modern mutex implementations at least use an atomic userspace check before putting the thread to sleep, but low latency locks can still beat them out in some cases. Linux has PTHREAD_MUTEX_ADAPTIVE_NP.

Both of these can be implemented in fairly little code where they don't exist using functionality SDL already provides, although the performance may be iffy for the RWlock.

slouken commented 1 year ago

Do you have a link to more information to understand the differences and how to use them?

nfries88 commented 1 year ago

This discusses the reasoning behind the use of a low-latency adaptive mutex in the OpenSolaris kernel: https://www.cs.dartmouth.edu/~sergey/cs258/adaptive-locks-and-preemption.txt

This discusses a patch implementing throughput-optimized futex for the linux kernel which is mostly the same concept, but inside the futex syscall: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1242734.html

The reasoning there applies to user-space implementations just as well, and iirc user-space mutexes are adaptive by default on OpenSolaris as well, and ofc glibc has had the adaptive pthread mutex basically forever.

The general concept is to hybridize a spinlock with a mutex, never entering the kernel to take the mutex if the critical section becomes free during a brief "optimistic" spin cycle.

Its main benefit over a spinlock is that it handles the case where the thread holding the lock got pre-empted inside the critical section.

Its main benefit over a mutex is that for short critical sections even a contested low latency lock won't usually have to enter the kernel.

Low latency locks definitely have drawbacks too: for long critical sections, they're effectively just extra work before waiting in the kernel. They don't handle priority differences well for the same reason as spinlocks (at least not without manually dealing with priority inversion). The optimistic spinning doesn't make any sense at all for uniprocessor machines. They generally aren't fair and can starve threads, though I could imagine someone somewhere has cooked up an adaptive ticket lock.

nfries88 commented 1 year ago

https://matklad.github.io/2020/01/04/mutexes-are-faster-than-spinlocks.html

benchmarks rust's standard library mutex, AMD's suggested spinlock implementation, and another spinlock implementation against WebKit's customized adaptive lock under a variety of conditions with a short critical section.

What you can see is WebKit's lock has stable average and worst-case latency compared to spinlocks. (the likely reason it significantly beats out the standard mutex under extreme contention is not only related to adaptive locking, but how it queues threads to be restarted)

icculus commented 8 months ago

SDL_RWLock went into revision control awhile ago, but I haven't read up on adaptive mutexes yet.

icculus commented 4 days ago

I think we're going to pass on adaptive mutexes; it sounds like a) something that should be implemented in system libraries, not SDL and b) something we would absolutely screw up. :)

So I think we're done here!

libsdl-org / SDL

low latency and rwlocks #7111