rizinorg / rizin

UNIX-like reverse engineering framework and command-line toolset.
https://rizin.re
GNU Lesser General Public License v3.0
2.51k stars 341 forks source link

subprocess: Protect proc->ret with a semaphore #4545

Closed kazarmy closed 1 week ago

kazarmy commented 2 weeks ago

Your checklist for this pull request

Detailed description

This pr fixes the following proc->ret data race, as mentioned on Mattermost:

WARNING: ThreadSanitizer: data race (pid=18332)
  Read of size 4 at 0x7b2800014fb8 by thread T3:
    #0 rz_subprocess_ret ../librz/util/subprocess.c:1551 (librz_util.so.0.8+0xa4d10)
    #1 rz_test_run_asm_test ../binrz/rz-test/run.c:355 (rz-test+0xda11)
    #2 rz_test_run_test ../binrz/rz-test/run.c:511 (rz-test+0xe34e)
    #3 worker_th ../binrz/rz-test/rz-test.c:686 (rz-test+0x5796)
    #4 thread_main_function ../librz/util/thread.c:21 (librz_util.so.0.8+0xaeb37)

  Previous write of size 4 at 0x7b2800014fb8 by thread T1 (mutexes: write M0):
    #0 sigchld_th ../librz/util/subprocess.c:807 (librz_util.so.0.8+0xa2467)
    #1 thread_main_function ../librz/util/thread.c:21 (librz_util.so.0.8+0xaeb37)

  Location is heap block of size 152 at 0x7b2800014fa0 allocated by thread T3:
    #0 calloc ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:672 (libtsan.so.0+0x31edc)
    #1 rz_subprocess_start_opt ../librz/util/subprocess.c:1047 (librz_util.so.0.8+0xa37bc)
    #2 rz_subprocess_start ../librz/util/subprocess.c:1614 (librz_util.so.0.8+0xa50e4)
    #3 rz_test_run_asm_test ../binrz/rz-test/run.c:349 (rz-test+0xd9e8)
    #4 rz_test_run_test ../binrz/rz-test/run.c:511 (rz-test+0xe34e)
    #5 worker_th ../binrz/rz-test/rz-test.c:686 (rz-test+0x5796)
    #6 thread_main_function ../librz/util/thread.c:21 (librz_util.so.0.8+0xaeb37)

  Mutex M0 (0x7b0c00000060) created at:
    #0 pthread_mutex_init ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1227 (libtsan.so.0+0x4bee1)
    #1 rz_th_lock_new ../librz/util/thread_lock.c:28 (librz_util.so.0.8+0xb1f87)
    #2 rz_subprocess_init ../librz/util/subprocess.c:819 (librz_util.so.0.8+0xa2df9)
    #3 rz_test_main ../binrz/rz-test/rz-test.c:357 (rz-test+0x7885)
    #4 main ../binrz/rz-test/rz-test.c:1406 (rz-test+0x52d5)

  Thread T3 (tid=18338, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:969 (libtsan.so.0+0x605b8)
    #1 rz_th_new ../librz/util/thread.c:211 (librz_util.so.0.8+0xaee4c)
    #2 rz_test_main ../binrz/rz-test/rz-test.c:536 (rz-test+0x8482)
    #3 main ../binrz/rz-test/rz-test.c:1406 (rz-test+0x52d5)

  Thread T1 (tid=18334, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:969 (libtsan.so.0+0x605b8)
    #1 rz_th_new ../librz/util/thread.c:211 (librz_util.so.0.8+0xaee4c)
    #2 rz_subprocess_init ../librz/util/subprocess.c:828 (librz_util.so.0.8+0xa2e3b)
    #3 rz_test_main ../binrz/rz-test/rz-test.c:357 (rz-test+0x7885)
    #4 main ../binrz/rz-test/rz-test.c:1406 (rz-test+0x52d5)

SUMMARY: ThreadSanitizer: data race ../librz/util/subprocess.c:1551 in rz_subprocess_ret

(The above was obtained using gcc TSan on Ubuntu 22.04 -- this build is broken regarding subprocesses in places not related to this pr, such as in !)

Hopefully, this fixes the problem with macos asm tests as well.

Test plan

All builds are green.

Closing issues

...

kazarmy commented 2 weeks ago

298605f8677c008c36bb469b865745b25525d4e1 98f728c7185303b786df25d1919da187dd0d3f07 tested via rz-test -t 1 db/archos/linux-x64/dbg_gdbserver_rebase ... 298605f8677c008c36bb469b865745b25525d4e1 caused the data race warning to reappear

XVilka commented 2 weeks ago

Doesn't seem to help the assembly test on macOS:

[XX] db/asm/java <asm> newarray int
-- <asm> newarray int <--> bc0a
-- assembly
kazarmy commented 2 weeks ago

Just noticed this killpipe pairing: https://github.com/rizinorg/rizin/blob/a3ffd6e38104fccc2504d371cd383765274e015c/librz/util/subprocess.c#L810 https://github.com/rizinorg/rizin/blob/a3ffd6e38104fccc2504d371cd383765274e015c/librz/util/subprocess.c#L1328-L1331

so there does appear to be read/write synchronization on proc->ret despite the protestations of TSan, so closing this.

kazarmy commented 2 weeks ago
[XX] db/asm/tricore <asm> add.f d1, d7, d12
-- <asm> add.f d1, d7, d12 <--- 6b0c2117 ---> <IL>
-- IL
--- expected
+++ actual
kazarmy commented 1 week ago

I think that's it for now ... 85089d08a6119f23c8f2b058093f68961c271135 is inelegant but the CI failure of 7b8ed9e4b8c974a6d505bf16f60ec3c11d3c5e5b suggests that there's something wrong with the original way of doing things (i think). Even if this pr doesn't fix the macos asm test problem, it does have 2 things going for it:

  1. The warning in the pr description goes away.
  2. Isn't a semaphore more lightweight than a pipe?
wargio commented 1 week ago

I think that's it for now ... 85089d0 is inelegant but the CI failure of 7b8ed9e suggests that there's something wrong with the original way of doing things (i think). Even if this pr doesn't fix the macos asm test problem, it does have 2 things going for it:

1. The warning in the [pr description](https://github.com/rizinorg/rizin/pull/4545#issue-2342038472) goes away.

2. Isn't a semaphore more lightweight than a pipe?

i strongly think the main issue is how we fork and execute. the whole thing in sys.c is very nasty

kazarmy commented 1 week ago

i did consider whether pipes were being closed a bit too soon, but that's a quest for another day i suppose