Closed Pavithra1602 closed 2 years ago
@Pavithra1602 Could you give us more details, please?
cd power-gzip/ make check
I tried to execute these commands and I was not able to reproduce it.
How are you running configure
?
Which Linux distribution are you using?
Which commit ID did you use?
What happens when you run make valgrind-check
?
@Pavithra1602 Could you give us more details, please?
cd power-gzip/ make check
I tried to execute these commands and I was not able to reproduce it. How are you running
configure
? Issue is not consistent, i observed this once in 3 trials. Running it using "make check" commandWhich Linux distribution are you using? I have seen this issue on rel8.6 and rhel9
Which commit ID did you use? I am using master branch
make valgrind-check
?
[root@ltcden13-lp3 power-gzip]# make valgrind-check
make -C test check \
LOG_COMPILER="valgrind" LOG_FLAGS="--leak-check=full --suppressions=/root/power-gzip/test/valgrind.supp --error-exitcode=1"
make[1]: Entering directory '/root/power-gzip/test'
make test_adler32 test_buf_error test_crc32 test_inflatesyncpoint test_multithread_stress test_pid_reuse test_stress test_deflate test_inflate test_dict test_zeroinput test_abi
make[2]: Entering directory '/root/power-gzip/test'
make[2]: 'test_adler32' is up to date.
make[2]: 'test_buf_error' is up to date.
make[2]: 'test_crc32' is up to date.
make[2]: 'test_inflatesyncpoint' is up to date.
make[2]: 'test_multithread_stress' is up to date.
make[2]: 'test_pid_reuse' is up to date.
make[2]: 'test_stress' is up to date.
make[2]: 'test_deflate' is up to date.
make[2]: 'test_inflate' is up to date.
make[2]: 'test_dict' is up to date.
make[2]: 'test_zeroinput' is up to date.
make[2]: Nothing to be done for 'test_abi'.
make[2]: Leaving directory '/root/power-gzip/test'
make check-TESTS
make[2]: Entering directory '/root/power-gzip/test'
make[3]: Entering directory '/root/power-gzip/test'
FAIL: test_adler32
FAIL: test_buf_error
FAIL: test_crc32
FAIL: test_inflatesyncpoint
FAIL: test_multithread_stress
FAIL: test_pid_reuse
FAIL: test_abi
FAIL: test_stress.auto
FAIL: test_deflate.auto
FAIL: test_inflate.auto
FAIL: test_dict.auto
FAIL: test_stress.sw
FAIL: test_deflate.sw
FAIL: test_inflate.sw
FAIL: test_dict.sw
FAIL: test_stress.nx
FAIL: test_deflate.nx
FAIL: test_inflate.nx
FAIL: test_dict.nx
FAIL: test_stress.mix
FAIL: test_deflate.mix
FAIL: test_inflate.mix
FAIL: test_dict.mix
FAIL: test_stress.mix2
FAIL: test_deflate.mix2
FAIL: test_inflate.mix2
FAIL: test_dict.mix2
FAIL: test_zeroinput.nxmake[3]: [Makefile:910: test-suite.log] Error 1 make[3]: Leaving directory '/root/power-gzip/test' make[2]: [Makefile:1018: check-TESTS] Error 2 make[2]: Leaving directory '/root/power-gzip/test' make[1]: [Makefile:1281: check-am] Error 2 make[1]: Leaving directory '/root/power-gzip/test' make: [Makefile:1007: valgrind-check] Error 2 [root@ltcden13-lp3 power-gzip]#
Issue is observed only on Denali
Issue is not consistent, i observed this once in 3 trials. Running it using "make check" command
@Pavithra1602 , make check
won't execute without running configure
first. Unless you have some leftovers from a previous build.
Could you test if the following fails too, please?
mkdir test-libnxz && cd test-libnxz
git clone https://github.com/libnxz/power-gzip.git
mkdir build && cd build
../power-gzip/configure
make -j$(nproc)
make -j$(nproc) check
make -j$(nproc) check
Issue is observed even with above steps
With @Pavithra1602 's help, I was able to reproduce this issue.
AFAICS, one of the threads from test_multithread_stress
fails in nx_wait_for_csb()
with:
CSB still not valid after 60 seconds, giving up
Then, it's unclear if the same thread or another one segfaults while running the paste instruction in nxu_run_job()
:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fffb7865e94 in vas_paste (offset=0, paste_address=0x7fffb6260400)
at ../../power-gzip/inc_nx/copy-paste.h:81
81 asm volatile(PPC_PASTE(%1, %2)";"
After the first error, nx_wait_for_csb()
returns -ETIMEDOUT
. It's necessary to investigate what happens after this value is returned.
This was fixed by #159.
--- Steps to recreate ----
--- Logs ----