tarantool / test-run

Tarantool functional testing framework
14 stars 14 forks source link

Helper server error return code does not fail the test for `core = app` #440

Open nshy opened 1 week ago

nshy commented 1 week ago

It hides memory leak/unexpected panic/crashes/failed assertions.

Repro. Let's make next changes:

diff --git a/src/main.cc b/src/main.cc
index 7dd4f537d..c9de9163f 100644
--- a/src/main.cc
+++ b/src/main.cc
@@ -729,6 +730,8 @@ print_help(FILE *stream)
        fprintf(stream, help_msg, tarantool_version());
 }

+int tarantool_do_panic = 0;
+
 int
 main(int argc, char **argv)
 {
@@ -1101,5 +1104,8 @@ main(int argc, char **argv)
        free((void *)instance.name);
        free((void *)instance.config);
        tarantool_free();
+       if (tarantool_do_panic)
+               panic("AAAAAAA!!!!");
+
        return exit_code;
 }
diff --git a/extra/exports b/extra/exports
index 1dbfedc5c..78db2513b 100644
--- a/extra/exports
+++ b/extra/exportss
@@ -679,3 +679,5 @@ luaM_sysprof_stop
 luaopen_misc

 # }}} LuaJIT API
+
+tarantool_do_panic
diff --git a/test/box/tiny.lua b/test/box/tiny.lua
index 608d48366..2b30f6dfd 100644
--- a/test/box/tiny.lua
+++ b/test/box/tiny.lua
@@ -13,3 +13,7 @@ require('console').listen(os.getenv('ADMIN'))
 box.once('init', function()
     box.schema.user.grant('guest', 'read,write,execute', 'universe')
 end)
+
+local ffi = require('ffi')
+ffi.cdef('int tarantool_do_panic;')
+ffi.C.tarantool_do_panic = 1

The test passes:

test/test-run.py --force --builddir ../build-dev box-tap/session.test
======================================================================================
WORKR TEST                                            RARAMS              RESULT
--------------------------------------------------------------------------------------
[001] box-tap/session.test.lua                                            [ pass ]
--------------------------------------------------------------------------------------

Yet the helper server panics as we planned:

$ grep AAAA /tmp/t/001_box-tap/tiny.log
2024-07-01 11:04:13.672 [50651] main F> AAAAAAA!!!!
2024-07-01 11:04:13.672 [50651] main F> AAAAAAA!!!!

In particular such behaviour may hide memory leak issues under ASAN CI.

I tried to prepare a patch for the issue and found the cause. The thing is gevent does not work properly together with subprocess and multiprocess packages. In particular in test-run on helper server error exit code we actually get 0 error code from subprocess.Popen.returncode in https://github.com/tarantool/test-run/blob/240cdeadf736a96a41c3d98a5a10dad2015f5135/lib/tarantool_server.py#L969-L972

In more details we run main server: https://github.com/tarantool/test-run/blob/240cdeadf736a96a41c3d98a5a10dad2015f5135/lib/app_server.py#L34-L44 with gevent.Popen. Unlike to subprocess.popen it tracks SIGCHILD signal and reaps ALL processes when receive it. In particular it reaps helper server process. So when we poll for it exit code we cannot find the process and subprocess decides to just report success.

See also #416

Totktonada commented 1 week ago

There are some findings around this problem in #252.