tarantool / tarantool-qa

QA related issues of Tarantool
3 stars 0 forks source link

Tests hang and fail in a certain reproduce file #230

Open Gerold103 opened 4 years ago

Gerold103 commented 4 years ago

Tarantool version: Master, 2.2, maybe older.

Reproduce:

- [box/space_bsize.test.lua, null]
- [box/sql.test.lua, null]
- [box/rtree_errinj.test.lua, null]
- [box/rtree_array.test.lua, null]
- [box/update.test.lua, null]
- [box/cfg.test.lua, null]
- [box/net_msg_max.test.lua, null]
- [box/access_misc.test.lua, null]
- [box/access_escalation.test.lua, null]
- [box/iproto_stress.test.lua, null]
- [box/role.test.lua, null]
- [box/blackhole.test.lua, null]
- [box/misc.test.lua, null]
- [box/tree_pk.test.lua, null]
- [box/transaction.test.lua, null]

Sometimes it passes, sometimes box/iprote_stress hangs. Sometimes box/role fails.

ligurio commented 4 years ago

Executed suite box 100 times and all tests passed.

Tarantool 2.4.0-99-g7ec7ced60
Target: Linux-x86_64-Debug
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=ON
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -fno-gnu89-inline -Wno-cast-function-type -Werror
CXX_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-invalid-offsetof -Wno-cast-function-type -Werror

Is there any missed details or any other hints how to reproduce?

ligurio commented 4 years ago

Finally reproduced box/net.box.test.lua hang five times out of 100 (48, 77, 82, 89, 90).

Tarantool 2.4.0-99-g7ec7ced60
Target: Linux-x86_64-Debug
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=ON
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -fno-gnu89-inline -Wno-cast-function-type -Werror
CXX_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-invalid-offsetof -Wno-cast-function-type -Werror

Command line:

for i in `seq 1 1 100`; do echo "$i XXXXXXXXXXXXXXXXX"; ../../test/test-run.py --builddir=/home/s.bronnikov/tarantool/build --vardir=/home/s.bronnikov/tarantool/build/test/var --suite box; done 2>&1 | tee ../../box-suite-100times.log

Hardware: mcs1, CentOS Linux release 8.0.1905 (Core)

I propose to wait a patch from @avtikhon with splitting netbox.test.lua to a set of independent tests and then investigate a bug.

Totktonada commented 4 years ago

I guess Vlad test it on Mac OS. I tried it with 2.5.0-173-gd1b3fbe9c (on tt-mac):

$ sw_vers
ProductName:    Mac OS X
ProductVersion: 10.12.6
BuildVersion:   16G29
$ cmake . -DCMAKE_BUILD_TYPE=Debug -DENABLE_BACKTRACE=ON -DENABLE_DIST=ON -DENABLE_BUNDLED_LIBCURL=OFF -DOPENSSL_ROOT_DIR=/usr/local/Cellar/openssl/1.0.2q/ && make -j
$ . ~/env-2.7/bin/activate # environment with test-run/requirements.txt libraries installed
$ git diff # remove fragile list to don't confuse --reproduce
diff --git a/test/box/suite.ini b/test/box/suite.ini
index de8f5a70e..98c5c7cb0 100644
--- a/test/box/suite.ini
+++ b/test/box/suite.ini
@@ -9,16 +9,3 @@ lua_libs = lua/fifo.lua lua/utils.lua lua/bitset.lua lua/index_random_test.lua l
 use_unix_sockets = True
 use_unix_sockets_iproto = True
 is_parallel = True
 pretest_clean = True
-fragile = bitset.test.lua      ; tarantool/tarantool-qa#235
-          func_reload.test.lua ; tarantool/tarantool-qa#15
-          function1.test.lua   ; tarantool/tarantool#4199
-          net.box.test.lua     ; tarantool/tarantool#3851 tarantool/tarantool#4383
-          alter_limits.test.lua ; tarantool/tarantool#4926
-          misc.test.lua        ; tarantool/tarantool-qa#223
-          tuple.test.lua       ; tarantool/tarantool-qa#219
-          transaction.test.lua ; tarantool/tarantool-qa#217
-          rtree_rect.test.lua  ; tarantool/tarantool-qa#214
-          sequence.test.lua    ; tarantool/tarantool-qa#213
-          on_replace.test.lua  ; tarantool/tarantool-qa#212
-          role.test.lua        ; tarantool/tarantool-qa#211
$ cat r.yml
- [box/space_bsize.test.lua, null]
- [box/sql.test.lua, null]
- [box/rtree_errinj.test.lua, null]
- [box/rtree_array.test.lua, null]
- [box/update.test.lua, null]
- [box/cfg.test.lua, null]
- [box/net_msg_max.test.lua, null]
- [box/access_misc.test.lua, null]
- [box/access_escalation.test.lua, null]
- [box/iproto_stress.test.lua, null]
- [box/role.test.lua, null]
- [box/blackhole.test.lua, null]
- [box/misc.test.lua, null]
- [box/tree_pk.test.lua, null]
- [box/transaction.test.lua, null]
$ (cd test && ./test-run.py --reproduce ../r.yml)

Catched the following fail:

[001] box/iproto_stress.test.lua                                      [ fail ]
[001] 
[001] Test failed! Result content mismatch:
[001] --- box/iproto_stress.result  Wed Apr  3 06:25:04 2019
[001] +++ box/iproto_stress.reject  Thu Jul  2 23:30:20 2020
[001] @@ -74,11 +74,11 @@
[001]  ...
[001]  test_run:wait_cond(function() return n_workers == 0 end, 60)
[001]  ---
[001] -- true
[001] +- false
[001]  ...
[001]  n_workers -- 0
[001]  ---
[001] -- 0
[001] +- 100
[001]  ...
[001]  n_errors -- 0
[001]  ---
[001] 
[001] Last 15 lines of Tarantool Log file [Instance "box"][/Users/a.turenko/tarantool/test/var/001_box/box.log]:
[001] 2020-07-02 23:29:20.049 [74501] main/9053/lua utils.c:1005 E> LuajitError: [string "function worker(i)     n_workers = n_workers ..."]:1: attempt to index field 'test' (a nil value)
[001] 2020-07-02 23:29:20.051 [74501] main/9055/lua utils.c:1005 E> LuajitError: [string "function worker(i)     n_workers = n_workers ..."]:1: attempt to index field 'test' (a nil value)
[001] 2020-07-02 23:29:20.054 [74501] main/9039/lua utils.c:1005 E> LuajitError: [string "function worker(i)     n_workers = n_workers ..."]:1: attempt to index field 'test' (a nil value)
[001] 2020-07-02 23:29:20.054 [74501] main/9041/lua utils.c:1005 E> LuajitError: [string "function worker(i)     n_workers = n_workers ..."]:1: attempt to index field 'test' (a nil value)
[001] 2020-07-02 23:29:20.054 [74501] main/9049/lua utils.c:1005 E> LuajitError: [string "function worker(i)     n_workers = n_workers ..."]:1: attempt to index field 'test' (a nil value)
[001] 2020-07-02 23:29:20.056 [74501] main/9047/lua utils.c:1005 E> LuajitError: [string "function worker(i)     n_workers = n_workers ..."]:1: attempt to index field 'test' (a nil value)
[001] 2020-07-02 23:29:20.057 [74501] main/9043/lua utils.c:1005 E> LuajitError: [string "function worker(i)     n_workers = n_workers ..."]:1: attempt to index field 'test' (a nil value)
[001] 2020-07-02 23:29:20.057 [74501] main/9045/lua utils.c:1005 E> LuajitError: [string "function worker(i)     n_workers = n_workers ..."]:1: attempt to index field 'test' (a nil value)
[001] 2020-07-02 23:29:20.205 [74501] main/8957/lua utils.c:1005 E> LuajitError: [string "function worker(i)     n_workers = n_workers ..."]:1: attempt to index field 'test' (a nil value)
[001] 2020-07-02 23:29:20.206 [74501] main/8955/lua utils.c:1005 E> LuajitError: [string "function worker(i)     n_workers = n_workers ..."]:1: attempt to index field 'test' (a nil value)
[001] 2020-07-02 23:29:20.206 [74501] main/8953/lua utils.c:1005 E> LuajitError: [string "function worker(i)     n_workers = n_workers ..."]:1: attempt to index field 'test' (a nil value)
[001] 2020-07-02 23:29:20.207 [74501] main/8949/lua utils.c:1005 E> LuajitError: [string "function worker(i)     n_workers = n_workers ..."]:1: attempt to index field 'test' (a nil value)
[001] 2020-07-02 23:29:20.209 [74501] main/8947/lua utils.c:1005 E> LuajitError: [string "function worker(i)     n_workers = n_workers ..."]:1: attempt to index field 'test' (a nil value)
[001] 2020-07-02 23:29:20.209 [74501] main/8951/lua utils.c:1005 E> LuajitError: [string "function worker(i)     n_workers = n_workers ..."]:1: attempt to index field 'test' (a nil value)
[001] 2020-07-02 23:30:20.098 [74501] main/159/console/unix/: I> set 'net_msg_max' configuration option to 768
[Main process] Got failed test; gently terminate all workers...
[001] Worker "001_box" got failed test; stopping the server...

However when I removed the pretest_clean suite.ini option (by mistake), then I got various miscompares on box/access_misc.test.lua and box/role.test.lua. Also once I catched segfault in mp_tuple_assert() like in tarantool/tarantool-qa#235.

However once I catched the following miscompare with pretest_clean:

[001] Test failed! Result content mismatch:
[001] --- box/role.result   Thu Aug 30 01:10:07 2018
[001] +++ box/role.reject   Thu Jul  2 23:44:02 2020
[001] @@ -605,27 +605,35 @@
[001]  ...
[001]  box.schema.role.drop("role1")
[001]  ---
[001] +- error: Unsupported role privilege 'execute'
[001]  ...
[001]  box.schema.role.drop("role2")
[001]  ---
[001] +- error: Unsupported role privilege 'execute'
[001]  ...
<...>

So it seems that testing on box/ test suite on Mac OS is quite unstable now. Maybe all those fails are due to LuaJIT in GC64 mode as in tarantool/tarantool-qa#235?