Closed mratsim closed 4 years ago
sleep(xxx) seems like a bad test to test on-blocking sync as it leads to deadlock with Clang and/or on Mac:
Tried to use the RCMC VM (why GPL code with no source published?) at http://plv.mpi-sws.org/rcmc/
But:
So seems like even for C/C++ there is an lack of model checker to deal with C++11 concurrency bugs.
Could there be a bug on OSX similar to https://linux-arm-kernel.infradead.narkive.com/jTmi6ANX/patch-arm-change-definition-of-cpu-relax-for-arm11mpcore
The cpu_relax() macro is often used in the body of busy-wait loops to ensure that the variable being spun on is re-loaded for each iteration. On the ARM11MPCore processor [where loads are prioritised over stores], spinning in such a loop will prevent the write buffer from draining. If a write contained in the write buffer indirectly affects the variable being spun on, there is a potential for deadlock. This deadlock is experienced when executing the KGDB testsuite on an SMP ARM11MPCore configuration.
This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores, forcing a flushing of the write buffer on SMP systems before the next load takes place. If the Kernel is not compiled for SMP support, this will expand to a barrier() as before.
Cc: Russell King - ARM Linux ***@arm.linux.org.uk Acked-by: Catalin Marinas ***@arm.com Signed-off-by: Will Deacon ***@arm.com
There is no obvious reason why this deadlock again, I don't use sleep: https://github.com/mratsim/weave/blob/c31e45f72501506fe11cf2c99cae0984fdd8f3ce/weave/parallel_tasks.nim#L250-L285
So it might be that with cpuRelax()
the main thread leaves priority forever and deadlock ...
And ARM deadlocked too:
Found the bug, there is a race condition in the test
let f = spawn sleepingLion(123)
while not f.isReady():
cpuRelax()
The spinlock was (intentionally :/) not dispatching incoming steal requests. But if between initialization of the threadpool and spawn
the freshly created threads didn't have time to send steal requests, the spawn stays in the root thread queue and the spinlock doesn't loadBalance
so we get a deadlock.
Still 2 bugs (maybe one):
Fixes for another time.
Trying to fix:
deadlock livelock happening in plain task parallelism and memory pool
counting bugs:
The implementation is run through CDS checker https://github.com/computersforpeace/model-checker but it internally uses dlmalloc
mspace
to replace uer's malloc and somehow it runs out of space and trigger this assert: https://github.com/computersforpeace/model-checker/blob/5c4efe5cd8bdfe1e85138396109876a121ca61d1/mymemory.cc#L166-L171Might need to complete my own at #127 or use
Relaxy
(which cannot properly deal with compiler relaxed reordering unfortunately).