nim-works / loony

A high throughput MPMC lock-free queue based on a paper by Giersch, Nolte et al implemented in pure Nim.
https://nim-works.github.io/loony/
MIT License
64 stars 4 forks source link

thread sanitizer does not agree with loony #9

Closed zevv closed 2 years ago

zevv commented 2 years ago
ico@workdoos:/tmp/loony(main)$ nim r  --passC:-fsanitize=thread --passL:-fsanitize=thread --threads:on tests/test.nim 
[...]
🟢 creation and initialization of the queue
FATAL: ThreadSanitizer CHECK failed: ../../../../src/libsanitizer/tsan/tsan_interface_atomic.cpp:223 "((IsLoadOrder(mo))) != (0)" (0x0, 0x0)
    #0 __tsan::TsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ../../../../src/libsanitizer/tsan/tsan_rtl_report.cpp:47 (libtsan.so.0+0x972d2)
    #1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:78 (libtsan.so.0+0xb513a)
    #2 AtomicLoad<long long unsigned int> ../../../../src/libsanitizer/tsan/tsan_interface_atomic.cpp:223 (libtsan.so.0+0x762e4)
    #3 __tsan_atomic64_load ../../../../src/libsanitizer/tsan/tsan_interface_atomic.cpp:539 (libtsan.so.0+0x762e4)
    #4 load_OOZloonyZnode_116 <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x6baa1)
    #5 advTail_test_334 <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x72289)
    #6 push_test_306 <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x731f4)
    #7 enqueue_test_303 <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x73378)
    #8 NimMainModule <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x819d5)
    #9 NimMainInner <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x797f3)
    #10 NimMain <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x79849)
    #11 main <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x798b9)
    #12 __libc_start_main ../csu/libc-start.c:308 (libc.so.6+0x26d09)
    #13 _start <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x94d9)

Error: execution of an external program failed: '/home/ico/.cache/nim/test_d/test_EF21B045952947A3759AD00DE01B49BBBC3590C4 '
shayanhabibi commented 2 years ago

Disruptek and Clybber were debugging an ARC branch (actual atomic ref counting) for cps and loony. That should help clear this up AFAIK @disruptek

Atomic release and then acquire of the ref count should ensure that the memory relating to the refs is flushed and available to the acquiring thread in memory safely.

disruptek commented 2 years ago

The memory is always available; the atomic ref counting just ensures that the queue accepts isolated refs from any thread that tries to push a ref.

Making sure the memory contents that the ref points to is correctly flushed is a separate problem...

shayanhabibi commented 2 years ago

Since updating the implementation, it should be cleaner in deallocations. We’re working on managing thread-safety assurances; either by copying (urgh), moving (eh) or cache invalidation using thread_fences. These will need to be prototyped, benchmarked and tested for safety before moving further.

shayanhabibi commented 2 years ago

Thread sanitizer prior to #13 is as follows

Hint: gc: arc; threads: on; opt: speed; options: -d:danger
94207 lines; 5.300s; 145.41MiB peakmem; proj: /Users/shayan/Documents/programming/GitHub/loony/tests/test.nim; out: /Users/shayan/Documents/programming/GitHub/loony/bin/test [SuccessX]
Hint: /Users/shayan/Documents/programming/GitHub/loony/bin/test  [Exec]
## loony
🟢 creation and initialization of the queue
##     1                       +12.3kb             823.0μs
❔ run some continuations through the queue in another thread: boring
##     2                                            87.0μs
queued 10000 continuations
==================
WARNING: ThreadSanitizer: data race (pid=14564)
  Write of size 8 at 0x000101e3dff8 by thread T1:
    #0 allocAligned0_OOZloonyZmemalloc_52 <null>:2 (test:x86_64+0x10001c8be)
    #1 allocNode_OOZloonyZnode_85 <null>:2 (test:x86_64+0x10001c98d)
    #2 advTail_test_311 <null>:2 (test:x86_64+0x10001d91e)
    #3 push_test_279 <null>:2 (test:x86_64+0x10001dbf2)
    #4 doContinualThings486540041_test_1085 <null>:2 (test:x86_64+0x10001e356)
    #5 trampoline_test_255 <null>:2 (test:x86_64+0x10001e90a)
    #6 runThings_test_38 <null>:2 (test:x86_64+0x10001eac0)
    #7 threadProcWrapDispatch_system_3651 <null>:2 (test:x86_64+0x10000bd45)
    #8 threadProcWrapper_system_3627 <null>:2 (test:x86_64+0x10000bf45)

  Previous write of size 8 at 0x000101e3dff8 by main thread:
    #0 createThread_system_3602 <null>:2 (test:x86_64+0x10000c1f4)
    #1 createThread_system_3593 <null>:2 (test:x86_64+0x10000c830)
    #2 NimMainModule <null>:2 (test:x86_64+0x100021b1f)
    #3 NimMainInner <null>:2 (test:x86_64+0x10001f9f1)
    #4 main <null>:2 (test:x86_64+0x100033f27)

  Location is global 'stats_system_1762' at 0x000101e3dff8 (test+0x000100042ff8)

  Thread T1 (tid=365419, running) created by main thread at:
    #0 pthread_create <null>:3 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x2cd8d)
    #1 createThread_system_3602 <null>:2 (test:x86_64+0x10000c436)
    #2 createThread_system_3593 <null>:2 (test:x86_64+0x10000c830)
    #3 NimMainModule <null>:2 (test:x86_64+0x100021b1f)
    #4 NimMainInner <null>:2 (test:x86_64+0x10001f9f1)
    #5 main <null>:2 (test:x86_64+0x100033f27)

SUMMARY: ThreadSanitizer: data race (test:x86_64+0x10001c8be) in allocAligned0_OOZloonyZmemalloc_52+0x2e
==================
==================
WARNING: ThreadSanitizer: data race (pid=14564)
  Write of size 8 at 0x0001045020a0 by thread T3:
    #0 eqcopy__OOZOOZOOZOOZOOZOnimbleZpkgsZcps4548O52O50ZcpsZtransform_7548 <null>:2 (test:x86_64+0x10001c482)
    #1 pop_test_39 <null>:2 (test:x86_64+0x10001e82e)
    #2 runThings_test_38 <null>:2 (test:x86_64+0x10001eb45)
    #3 threadProcWrapDispatch_system_3651 <null>:2 (test:x86_64+0x10000bd45)
    #4 threadProcWrapper_system_3627 <null>:2 (test:x86_64+0x10000bf45)

  Previous write of size 8 at 0x0001045020a0 by thread T1:
    #0 eqsink__OOZOOZOOZOOZOOZOnimbleZpkgsZcps4548O52O50ZcpsZtransform_7551 <null>:2 (test:x86_64+0x10001c548)
    #1 trampoline_test_255 <null>:2 (test:x86_64+0x10001e915)
    #2 runThings_test_38 <null>:2 (test:x86_64+0x10001eac0)
    #3 threadProcWrapDispatch_system_3651 <null>:2 (test:x86_64+0x10000bd45)
    #4 threadProcWrapper_system_3627 <null>:2 (test:x86_64+0x10000bf45)

  Thread T3 (tid=365421, running) created by main thread at:
    #0 pthread_create <null>:3 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x2cd8d)
    #1 createThread_system_3602 <null>:2 (test:x86_64+0x10000c436)
    #2 createThread_system_3593 <null>:2 (test:x86_64+0x10000c830)
    #3 NimMainModule <null>:2 (test:x86_64+0x100021b1f)
    #4 NimMainInner <null>:2 (test:x86_64+0x10001f9f1)
    #5 main <null>:2 (test:x86_64+0x100033f27)

  Thread T1 (tid=365419, running) created by main thread at:
    #0 pthread_create <null>:3 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x2cd8d)
    #1 createThread_system_3602 <null>:2 (test:x86_64+0x10000c436)
    #2 createThread_system_3593 <null>:2 (test:x86_64+0x10000c830)
    #3 NimMainModule <null>:2 (test:x86_64+0x100021b1f)
    #4 NimMainInner <null>:2 (test:x86_64+0x10001f9f1)
    #5 main <null>:2 (test:x86_64+0x100033f27)

SUMMARY: ThreadSanitizer: data race (test:x86_64+0x10001c482) in eqcopy__OOZOOZOOZOOZOOZOnimbleZpkgsZcps4548O52O50ZcpsZtransform_7548+0x32
==================
==================
WARNING: ThreadSanitizer: data race (pid=14564)
  Write of size 8 at 0x000104502070 by thread T2:
    #0 eqcopy__OOZOOZOOZOOZOOZOnimbleZpkgsZcps4548O52O50ZcpsZtransform_7548 <null>:2 (test:x86_64+0x10001c482)
    #1 pop_test_39 <null>:2 (test:x86_64+0x10001e82e)
    #2 runThings_test_38 <null>:2 (test:x86_64+0x10001eb45)
    #3 threadProcWrapDispatch_system_3651 <null>:2 (test:x86_64+0x10000bd45)
    #4 threadProcWrapper_system_3627 <null>:2 (test:x86_64+0x10000bf45)

  Previous write of size 8 at 0x000104502070 by thread T1:
    #0 eqsink__OOZOOZOOZOOZOOZOnimbleZpkgsZcps4548O52O50ZcpsZtransform_7551 <null>:2 (test:x86_64+0x10001c548)
    #1 runThings_test_38 <null>:2 (test:x86_64+0x10001eae7)
    #2 threadProcWrapDispatch_system_3651 <null>:2 (test:x86_64+0x10000bd45)
    #3 threadProcWrapper_system_3627 <null>:2 (test:x86_64+0x10000bf45)

  Thread T2 (tid=365420, running) created by main thread at:
    #0 pthread_create <null>:3 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x2cd8d)
    #1 createThread_system_3602 <null>:2 (test:x86_64+0x10000c436)
    #2 createThread_system_3593 <null>:2 (test:x86_64+0x10000c830)
    #3 NimMainModule <null>:2 (test:x86_64+0x100021b1f)
    #4 NimMainInner <null>:2 (test:x86_64+0x10001f9f1)
    #5 main <null>:2 (test:x86_64+0x100033f27)

  Thread T1 (tid=365419, running) created by main thread at:
    #0 pthread_create <null>:3 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x2cd8d)
    #1 createThread_system_3602 <null>:2 (test:x86_64+0x10000c436)
    #2 createThread_system_3593 <null>:2 (test:x86_64+0x10000c830)
    #3 NimMainModule <null>:2 (test:x86_64+0x100021b1f)
    #4 NimMainInner <null>:2 (test:x86_64+0x10001f9f1)
    #5 main <null>:2 (test:x86_64+0x100033f27)

SUMMARY: ThreadSanitizer: data race (test:x86_64+0x10001c482) in eqcopy__OOZOOZOOZOOZOOZOnimbleZpkgsZcps4548O52O50ZcpsZtransform_7548+0x32
==================
created 4 threads
==================
WARNING: ThreadSanitizer: data race (pid=14564)
  Write of size 8 at 0x000101e3e000 by thread T3:
    #0 deallocShared <null>:2 (test:x86_64+0x100008aeb)
    #1 deallocAligned <null>:2 (test:x86_64+0x10001c953)
    #2 eqdestroy__OOZloonyZnode_74 <null>:2 (test:x86_64+0x10001c9be)
    #3 advTail_test_311 <null>:2 (test:x86_64+0x10001d969)
    #4 push_test_279 <null>:2 (test:x86_64+0x10001dbf2)
    #5 PostX20Call__test_1148 <null>:2 (test:x86_64+0x10001deff)
    #6 trampoline_test_255 <null>:2 (test:x86_64+0x10001e90a)
    #7 runThings_test_38 <null>:2 (test:x86_64+0x10001eac0)
    #8 threadProcWrapDispatch_system_3651 <null>:2 (test:x86_64+0x10000bd45)
    #9 threadProcWrapper_system_3627 <null>:2 (test:x86_64+0x10000bf45)

  Previous write of size 8 at 0x000101e3e000 by thread T4:
    #0 deallocShared <null>:2 (test:x86_64+0x100008aeb)
    #1 deallocAligned <null>:2 (test:x86_64+0x10001c953)
    #2 eqdestroy__OOZloonyZnode_74 <null>:2 (test:x86_64+0x10001c9be)
    #3 advTail_test_311 <null>:2 (test:x86_64+0x10001d969)
    #4 push_test_279 <null>:2 (test:x86_64+0x10001dbf2)
    #5 PostX20Call__test_1148 <null>:2 (test:x86_64+0x10001deff)
    #6 trampoline_test_255 <null>:2 (test:x86_64+0x10001e90a)
    #7 runThings_test_38 <null>:2 (test:x86_64+0x10001eac0)
    #8 threadProcWrapDispatch_system_3651 <null>:2 (test:x86_64+0x10000bd45)
    #9 threadProcWrapper_system_3627 <null>:2 (test:x86_64+0x10000bf45)

  Location is global 'stats_system_1762' at 0x000101e3dff8 (test+0x000100043000)

  Thread T3 (tid=365421, running) created by main thread at:
    #0 pthread_create <null>:3 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x2cd8d)
    #1 createThread_system_3602 <null>:2 (test:x86_64+0x10000c436)
    #2 createThread_system_3593 <null>:2 (test:x86_64+0x10000c830)
    #3 NimMainModule <null>:2 (test:x86_64+0x100021b1f)
    #4 NimMainInner <null>:2 (test:x86_64+0x10001f9f1)
    #5 main <null>:2 (test:x86_64+0x100033f27)

  Thread T4 (tid=365423, running) created by main thread at:
    #0 pthread_create <null>:3 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x2cd8d)
    #1 createThread_system_3602 <null>:2 (test:x86_64+0x10000c436)
    #2 createThread_system_3593 <null>:2 (test:x86_64+0x10000c830)
    #3 NimMainModule <null>:2 (test:x86_64+0x100021b1f)
    #4 NimMainInner <null>:2 (test:x86_64+0x10001f9f1)
    #5 main <null>:2 (test:x86_64+0x100033f27)

SUMMARY: ThreadSanitizer: data race (test:x86_64+0x100008aeb) in deallocShared+0x2b
==================
joined 4 threads
(allocCount: 10045, deallocCount: 9965)
🟢 run some continuations through the queue in many threads
##     3                        +348kb             585.4ms
## 3 tests            🟢2  ❔1        
ThreadSanitizer: reported 4 warnings
SIGABRT: Abnormal termination.
Error: execution of an external program failed

As of #13 :

Hint: gc: arc; threads: on; opt: speed; options: -d:danger
94208 lines; 5.796s; 145.504MiB peakmem; proj: /Users/shayan/Documents/programming/GitHub/loony/tests/test.nim; out: /Users/shayan/Documents/programming/GitHub/loony/bin/test [SuccessX]
Hint: /Users/shayan/Documents/programming/GitHub/loony/bin/test  [Exec]
## loony
🟢 creation and initialization of the queue
##     1                       +12.3kb               1.3ms
❔ run some continuations through the queue in another thread: boring
##     2                                           118.0μs
queued 10000 continuations
==================
WARNING: ThreadSanitizer: data race (pid=14686)
  Write of size 8 at 0x00010f36a008 by main thread:
    #0 createThread_system_3602 <null>:2 (test:x86_64+0x10000c354)
    #1 createThread_system_3593 <null>:2 (test:x86_64+0x10000c990)
    #2 NimMainModule <null>:2 (test:x86_64+0x100021b0f)
    #3 NimMainInner <null>:2 (test:x86_64+0x10001f9e1)
    #4 main <null>:2 (test:x86_64+0x100033f17)

  Previous write of size 8 at 0x00010f36a008 by thread T1:
    #0 allocAligned0_OOZloonyZmemalloc_52 <null>:2 (test:x86_64+0x10001ca1e)
    #1 allocNode_OOZloonyZnode_85 <null>:2 (test:x86_64+0x10001caed)
    #2 advTail_test_311 <null>:2 (test:x86_64+0x10001da4a)
    #3 push_test_279 <null>:2 (test:x86_64+0x10001dc55)
    #4 doContinualThings486540038_test_1082 <null>:2 (test:x86_64+0x10001e336)
    #5 trampoline_test_255 <null>:2 (test:x86_64+0x10001e8fa)
    #6 runThings_test_38 <null>:2 (test:x86_64+0x10001eab0)
    #7 threadProcWrapDispatch_system_3651 <null>:2 (test:x86_64+0x10000bea5)
    #8 threadProcWrapper_system_3627 <null>:2 (test:x86_64+0x10000c0a5)

  Location is global 'stats_system_1762' at 0x00010f36a008 (test+0x000100043008)

  Thread T1 (tid=367504, running) created by main thread at:
    #0 pthread_create <null>:3 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x2cd8d)
    #1 createThread_system_3602 <null>:2 (test:x86_64+0x10000c596)
    #2 createThread_system_3593 <null>:2 (test:x86_64+0x10000c990)
    #3 NimMainModule <null>:2 (test:x86_64+0x100021b0f)
    #4 NimMainInner <null>:2 (test:x86_64+0x10001f9e1)
    #5 main <null>:2 (test:x86_64+0x100033f17)

SUMMARY: ThreadSanitizer: data race (test:x86_64+0x10000c354) in createThread_system_3602+0x44
==================
created 4 threads
==================
WARNING: ThreadSanitizer: data race (pid=14686)
  Write of size 8 at 0x00010f36a010 by thread T4:
    #0 deallocShared <null>:2 (test:x86_64+0x100008c4b)
    #1 deallocAligned <null>:2 (test:x86_64+0x10001cab3)
    #2 eqdestroy__OOZloonyZnode_74 <null>:2 (test:x86_64+0x10001cb1e)
    #3 advTail_test_311 <null>:2 (test:x86_64+0x10001da81)
    #4 push_test_279 <null>:2 (test:x86_64+0x10001dc55)
    #5 PostX20Call__test_1145 <null>:2 (test:x86_64+0x10001dedf)
    #6 trampoline_test_255 <null>:2 (test:x86_64+0x10001e8fa)
    #7 runThings_test_38 <null>:2 (test:x86_64+0x10001eab0)
    #8 threadProcWrapDispatch_system_3651 <null>:2 (test:x86_64+0x10000bea5)
    #9 threadProcWrapper_system_3627 <null>:2 (test:x86_64+0x10000c0a5)

  Previous write of size 8 at 0x00010f36a010 by thread T2:
    #0 deallocShared <null>:2 (test:x86_64+0x100008c4b)
    #1 deallocAligned <null>:2 (test:x86_64+0x10001cab3)
    #2 eqdestroy__OOZloonyZnode_74 <null>:2 (test:x86_64+0x10001cb1e)
    #3 advTail_test_311 <null>:2 (test:x86_64+0x10001da81)
    #4 push_test_279 <null>:2 (test:x86_64+0x10001dc55)
    #5 PostX20Call__test_1145 <null>:2 (test:x86_64+0x10001dedf)
    #6 trampoline_test_255 <null>:2 (test:x86_64+0x10001e8fa)
    #7 runThings_test_38 <null>:2 (test:x86_64+0x10001eab0)
    #8 threadProcWrapDispatch_system_3651 <null>:2 (test:x86_64+0x10000bea5)
    #9 threadProcWrapper_system_3627 <null>:2 (test:x86_64+0x10000c0a5)

  Location is global 'stats_system_1762' at 0x00010f36a008 (test+0x000100043010)

  Thread T4 (tid=367543, running) created by main thread at:
    #0 pthread_create <null>:3 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x2cd8d)
    #1 createThread_system_3602 <null>:2 (test:x86_64+0x10000c596)
    #2 createThread_system_3593 <null>:2 (test:x86_64+0x10000c990)
    #3 NimMainModule <null>:2 (test:x86_64+0x100021b0f)
    #4 NimMainInner <null>:2 (test:x86_64+0x10001f9e1)
    #5 main <null>:2 (test:x86_64+0x100033f17)

  Thread T2 (tid=367539, running) created by main thread at:
    #0 pthread_create <null>:3 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x2cd8d)
    #1 createThread_system_3602 <null>:2 (test:x86_64+0x10000c596)
    #2 createThread_system_3593 <null>:2 (test:x86_64+0x10000c990)
    #3 NimMainModule <null>:2 (test:x86_64+0x100021b0f)
    #4 NimMainInner <null>:2 (test:x86_64+0x10001f9e1)
    #5 main <null>:2 (test:x86_64+0x100033f17)

SUMMARY: ThreadSanitizer: data race (test:x86_64+0x100008c4b) in deallocShared+0x2b
==================
joined 4 threads
(allocCount: 10046, deallocCount: 10013)
🟢 run some continuations through the queue in many threads
##     3                        +348kb             552.7ms
## 3 tests            🟢2  ❔1        
ThreadSanitizer: reported 2 warnings
SIGABRT: Abnormal termination.
Error: execution of an external program failed

AFAICT from my current testing and debugging, those last warnings are unavoidably due to the actual thread objects deallocation which is beyond the scope of this package and hinders any further analysis of thread safety.

Edit: Sigh. The issue is (as usual) more than just that; either way we need actual work to be done with these tests to see if computations run into undefined behaviour or not. See #14

shayanhabibi commented 2 years ago

The newest implementation covers all the safety concerns that can be managed outside of the nim std/system lib.

See #14

Else; love for you to do some tests on this baby