Open nshy opened 1 week ago
It hides memory leak/unexpected panic/crashes/failed assertions.
Repro. Let's make next changes:
diff --git a/src/main.cc b/src/main.cc index 7dd4f537d..c9de9163f 100644 --- a/src/main.cc +++ b/src/main.cc @@ -729,6 +730,8 @@ print_help(FILE *stream) fprintf(stream, help_msg, tarantool_version()); } +int tarantool_do_panic = 0; + int main(int argc, char **argv) { @@ -1101,5 +1104,8 @@ main(int argc, char **argv) free((void *)instance.name); free((void *)instance.config); tarantool_free(); + if (tarantool_do_panic) + panic("AAAAAAA!!!!"); + return exit_code; } diff --git a/extra/exports b/extra/exports index 1dbfedc5c..78db2513b 100644 --- a/extra/exports +++ b/extra/exportss @@ -679,3 +679,5 @@ luaM_sysprof_stop luaopen_misc # }}} LuaJIT API + +tarantool_do_panic diff --git a/test/box/tiny.lua b/test/box/tiny.lua index 608d48366..2b30f6dfd 100644 --- a/test/box/tiny.lua +++ b/test/box/tiny.lua @@ -13,3 +13,7 @@ require('console').listen(os.getenv('ADMIN')) box.once('init', function() box.schema.user.grant('guest', 'read,write,execute', 'universe') end) + +local ffi = require('ffi') +ffi.cdef('int tarantool_do_panic;') +ffi.C.tarantool_do_panic = 1
The test passes:
test/test-run.py --force --builddir ../build-dev box-tap/session.test ====================================================================================== WORKR TEST RARAMS RESULT -------------------------------------------------------------------------------------- [001] box-tap/session.test.lua [ pass ] --------------------------------------------------------------------------------------
Yet the helper server panics as we planned:
$ grep AAAA /tmp/t/001_box-tap/tiny.log 2024-07-01 11:04:13.672 [50651] main F> AAAAAAA!!!! 2024-07-01 11:04:13.672 [50651] main F> AAAAAAA!!!!
In particular such behaviour may hide memory leak issues under ASAN CI.
I tried to prepare a patch for the issue and found the cause. The thing is gevent does not work properly together with subprocess and multiprocess packages. In particular in test-run on helper server error exit code we actually get 0 error code from subprocess.Popen.returncode in https://github.com/tarantool/test-run/blob/240cdeadf736a96a41c3d98a5a10dad2015f5135/lib/tarantool_server.py#L969-L972
gevent
subprocess
multiprocess
test-run
subprocess.Popen.returncode
In more details we run main server: https://github.com/tarantool/test-run/blob/240cdeadf736a96a41c3d98a5a10dad2015f5135/lib/app_server.py#L34-L44 with gevent.Popen. Unlike to subprocess.popen it tracks SIGCHILD signal and reaps ALL processes when receive it. In particular it reaps helper server process. So when we poll for it exit code we cannot find the process and subprocess decides to just report success.
gevent.Popen
subprocess.popen
SIGCHILD
See also #416
There are some findings around this problem in #252.
It hides memory leak/unexpected panic/crashes/failed assertions.
Repro. Let's make next changes:
The test passes:
Yet the helper server panics as we planned:
In particular such behaviour may hide memory leak issues under ASAN CI.
I tried to prepare a patch for the issue and found the cause. The thing is
gevent
does not work properly together withsubprocess
andmultiprocess
packages. In particular intest-run
on helper server error exit code we actually get 0 error code fromsubprocess.Popen.returncode
in https://github.com/tarantool/test-run/blob/240cdeadf736a96a41c3d98a5a10dad2015f5135/lib/tarantool_server.py#L969-L972In more details we run main server: https://github.com/tarantool/test-run/blob/240cdeadf736a96a41c3d98a5a10dad2015f5135/lib/app_server.py#L34-L44 with
gevent.Popen
. Unlike tosubprocess.popen
it tracksSIGCHILD
signal and reaps ALL processes when receive it. In particular it reaps helper server process. So when we poll for it exit code we cannot find the process andsubprocess
decides to just report success.See also #416