openzim / libzim

Reference implementation of the ZIM specification
https://download.openzim.org/release/libzim/
GNU General Public License v2.0
169 stars 50 forks source link

The reader test seems to stuck in a deadlock (zim-testing-suite) for some reason. #670

Open Animeshz opened 2 years ago

Animeshz commented 2 years ago

Seems like running meson test fails with the first test of reader.cpp test going timeout of 120s.

Logs:

27/27 reader              TIMEOUT        120.06s   killed by signal 15 SIGTERM
>>> ZIM_TEST_DATA_DIR=/builddir/libzim-7.2.0/build/test/data MALLOC_PERTURB_=48 /builddir/libzim-7.2.0/build/test/reader
――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
Running main() from ../googletest/src/gtest_main.cc
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from FileReader
[ RUN      ] FileReader.shouldJustWork
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

Summary of Failures:

27/27 reader              TIMEOUT        120.06s   killed by signal 15 SIGTERM

Ok:                 26  
Expected Fail:      0   
Fail:               0   
Unexpected Pass:    0   
Skipped:            0   
Timeout:            1   

Full log written to /builddir/libzim-7.2.0/build/meson-logs/testlog.txt
FAILED: meson-test 
/usr/bin/meson test --no-rebuild --print-errorlogs
ninja: build stopped: subcommand failed.
=> ERROR: libzim-7.2.0_1: do_check: '${make_cmd} -C ${meson_builddir} ${makejobs} ${make_check_args} ${make_check_target}' exited with 1
=> ERROR:   in do_check() at common/build-style/meson.sh:141
kelson42 commented 2 years ago

Have you tried SKIP_BIG_MEMORY_TEST=1 meson test

Animeshz commented 2 years ago

@kelson42 still seems to stuck there

[animesh@/home/animesh/Projects/void-packages/masterdir build]$ SKIP_BIG_MEMORY_TEST=1 meson test
ninja: Entering directory `/builddir/libzim-7.2.0/build'
ninja: no work to do.
 1/27 lrucache                    OK               0.02s
 2/27 dirent                      OK               0.02s
 3/27 header                      OK               0.01s
 4/27 template                    OK               0.01s
 5/27 iterator                    OK               0.02s
 6/27 dirent_lookup               OK               0.01s
 7/27 istreamreader               OK               0.01s
 8/27 find                        OK               0.16s
 9/27 rawstreamreader             OK               0.01s
10/27 bufferstreamer              OK               0.01s
11/27 parseLongPath               OK               0.01s
12/27 random                      OK               0.04s
13/27 tooltesting                 OK               0.01s
14/27 creator                     OK               0.29s
15/27 tinyString                  OK               0.01s
16/27 cluster                     OK               0.53s
17/27 indexing_criteria           OK               0.66s
18/27 decoderstreamreader         OK               0.90s
19/27 defaultIndexdata            OK               0.03s
20/27 uuid                        OK               1.01s
21/27 search                      OK               0.93s
22/27 suggestion_iterator         OK               1.25s
23/27 search_iterator             OK               0.72s
24/27 archive                     OK               2.20s
25/27 suggestion                  OK               2.18s
26/27 compression                 OK               3.76s
27/27 reader                      TIMEOUT        120.01s   killed by signal 15 SIGTERM
>>> ZIM_TEST_DATA_DIR=/builddir/libzim-7.2.0/build/test/data MALLOC_PERTURB_=247 /builddir/libzim-7.2.0/build/test/reader

Ok:                 26  
Expected Fail:      0   
Fail:               0   
Unexpected Pass:    0   
Skipped:            0   
Timeout:            1   

Full log written to /builddir/libzim-7.2.0/build/meson-logs/testlog.txt
kelson42 commented 2 years ago

@mgautierfr @veloman-yunkan Any idea?

veloman-yunkan commented 2 years ago

@Animeshz

Please run

gdb -ex run -args /builddir/libzim-7.2.0/build/test/reader

wait for a few seconds, hit CTRL-C, enter the bt command in the gdb prompt and paste its output here.

Animeshz commented 2 years ago

@veloman-yunkan

[animesh@/home/animesh/Projects/void-packages/masterdir /]$ gdb -ex run -args /builddir/libzim-7.2.0/build/test/reader
GNU gdb (GDB) 11.1
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /builddir/libzim-7.2.0/build/test/reader...
Starting program: /builddir/libzim-7.2.0/build/test/reader 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Running main() from ../googletest/src/gtest_main.cc
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from FileReader
[ RUN      ] FileReader.shouldJustWork
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7b5b366 in __libc_pread64 (fd=3, buf=0x7fffffffe62f, count=1, offset=26) at ../sysdeps/unix/sysv/linux/pread64.c:25
25  ../sysdeps/unix/sysv/linux/pread64.c: No such file or directory.
(gdb) bt
#0  0x00007ffff7b5b366 in __libc_pread64 (fd=3, buf=0x7fffffffe62f, count=1, offset=26) at ../sysdeps/unix/sysv/linux/pread64.c:25
#1  0x00007ffff7f7726c in zim::unix::FD::readAt(char*, zim::zsize_t, zim::offset_t) const () from /builddir/libzim-7.2.0/build/test/../src/libzim.so.7
#2  0x00007ffff7f62150 in zim::FileReader::read(zim::offset_t) const () from /builddir/libzim-7.2.0/build/test/../src/libzim.so.7
#3  0x000055555555c717 in (anonymous namespace)::FileReader_shouldJustWork_Test::TestBody() ()
#4  0x00007ffff7f07d97 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) () from /usr/lib64/libgtest.so.1.11.0
#5  0x00007ffff7efc8ee in testing::Test::Run() () from /usr/lib64/libgtest.so.1.11.0
#6  0x00007ffff7efca65 in testing::TestInfo::Run() () from /usr/lib64/libgtest.so.1.11.0
#7  0x00007ffff7efcfe9 in testing::TestSuite::Run() () from /usr/lib64/libgtest.so.1.11.0
#8  0x00007ffff7efd71a in testing::internal::UnitTestImpl::RunAllTests() () from /usr/lib64/libgtest.so.1.11.0
#9  0x00007ffff7f08307 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) () from /usr/lib64/libgtest.so.1.11.0
#10 0x00007ffff7efcb28 in testing::UnitTest::Run() () from /usr/lib64/libgtest.so.1.11.0
#11 0x00007ffff7f220e0 in main () from /usr/lib64/libgtest_main.so.1.11.0
#12 0x00007ffff7a95e0a in __libc_start_main (main=0x7ffff7f220a0 <main>, argc=1, argv=0x7fffffffeca8, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffffec98) at ../csu/libc-start.c:314
#13 0x00005555555595aa in _start () at ../sysdeps/x86_64/start.S:120
(gdb) 
veloman-yunkan commented 2 years ago

The test at https://github.com/openzim/libzim/blob/a12d14ae51812938212d11b4f8720036c8e593cf/test/reader.cpp#L93 works in a debug build because the expected exception is generated by an assertion at https://github.com/openzim/libzim/blob/a12d14ae51812938212d11b4f8720036c8e593cf/src/file_reader.cpp#L65

In a release build, the assertion is disabled and the execution proceeds past that point leading to an infinite loop. @mgautierfr as the author of that test should decide how this should be fixed.

veloman-yunkan commented 2 years ago

@kelson42 @mgautierfr I think we should have a release build configuration (for at least one platform) in our CI.

kelson42 commented 2 years ago

@veloman-yunkan Supportive of this idea.

kelson42 commented 2 years ago

@mgautierfr Your feedback about a solution approach is expected here.

mgautierfr commented 2 years ago

Sorry, I've totally missed this issue.

In a release build, the assertion is disabled and the execution proceeds past that point leading to an infinite loop

I'm surprised about that. We build libzim in release build configuration in kiwix-build each time we do a release. If building in release mode would be enough to see the bug, we should have seen it since a long time. => After verification, assert are not removed by default in release mode. We have to pass a explicit b_ndebug to true or if-release to meson to remove assertion.

@mgautierfr as the author of that test should decide how this should be fixed.

From a global perspective, ASSERTs are the last way to check something goes wrong. It should not be considered as valid way to check is ok. (And it is coherent with the fact that we should remove them in release mode) ASSERTs here are mainly used to check that we are not reading data out of bound. It can be because of two things:

In this context, ASSERT in low level should not be considered as a normative behavior and we should not test them. Reading out of range (at this level in libzim) can be considered as undefined behavior (as std::vector [] operator does)

We have several things to do :

kelson42 commented 2 years ago

@mgautierfr @veloman-yunkan What is the status/next step on this? FYI this is currently the only one bug known in the libzim!

mgautierfr commented 2 years ago

I have an old branch for a started work on this. I've just created a WIP PR https://github.com/openzim/libzim/pull/723