mratsim / weave

A state-of-the-art multithreading runtime: message-passing based, fast, scalable, ultra-low overhead
Other
532 stars 22 forks source link

Model Checked - MPSC queue + Rework State Machines #128

Closed mratsim closed 4 years ago

mratsim commented 4 years ago

Trying to fix:

The implementation is run through CDS checker https://github.com/computersforpeace/model-checker but it internally uses dlmalloc mspace to replace uer's malloc and somehow it runs out of space and trigger this assert: https://github.com/computersforpeace/model-checker/blob/5c4efe5cd8bdfe1e85138396109876a121ca61d1/mymemory.cc#L166-L171

Might need to complete my own at #127 or use Relaxy (which cannot properly deal with compiler relaxed reordering unfortunately).

mratsim commented 4 years ago

sleep(xxx) seems like a bad test to test on-blocking sync as it leads to deadlock with Clang and/or on Mac:

mratsim commented 4 years ago

Tried to use the RCMC VM (why GPL code with no source published?) at http://plv.mpi-sws.org/rcmc/

But:

So seems like even for C/C++ there is an lack of model checker to deal with C++11 concurrency bugs.

mratsim commented 4 years ago

Could there be a bug on OSX similar to https://linux-arm-kernel.infradead.narkive.com/jTmi6ANX/patch-arm-change-definition-of-cpu-relax-for-arm11mpcore

The cpu_relax() macro is often used in the body of busy-wait loops to ensure that the variable being spun on is re-loaded for each iteration. On the ARM11MPCore processor [where loads are prioritised over stores], spinning in such a loop will prevent the write buffer from draining. If a write contained in the write buffer indirectly affects the variable being spun on, there is a potential for deadlock. This deadlock is experienced when executing the KGDB testsuite on an SMP ARM11MPCore configuration.

This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores, forcing a flushing of the write buffer on SMP systems before the next load takes place. If the Kernel is not compiled for SMP support, this will expand to a barrier() as before.

Cc: Russell King - ARM Linux ***@arm.linux.org.uk Acked-by: Catalin Marinas ***@arm.com Signed-off-by: Will Deacon ***@arm.com

There is no obvious reason why this deadlock again, I don't use sleep: https://github.com/mratsim/weave/blob/c31e45f72501506fe11cf2c99cae0984fdd8f3ce/weave/parallel_tasks.nim#L250-L285

So it might be that with cpuRelax() the main thread leaves priority forever and deadlock ...

And ARM deadlocked too:

mratsim commented 4 years ago

Found the bug, there is a race condition in the test

   let f = spawn sleepingLion(123) 
   while not f.isReady(): 
     cpuRelax() 

The spinlock was (intentionally :/) not dispatching incoming steal requests. But if between initialization of the threadpool and spawn the freshly created threads didn't have time to send steal requests, the spawn stays in the root thread queue and the spinlock doesn't loadBalance so we get a deadlock.

mratsim commented 4 years ago

Still 2 bugs (maybe one):

Fixes for another time.