Open foursixnine opened 2 years ago
I run it several times. Didn't found anything. Also from the code I don't see anything. There are a lot of sleeps in 02_parallel.t which are all not necessary, So I run 2 versions one with sleep and one without
COUNT: 62338 t/02_parallel_fast.t => Successful
COUNT: 3079 t/02_parallel.t => Successful
so I don't know what it was on that CI run. My guess is, it came in connection with codecov, as also no test step failed but the script was executed != 0
COUNT: 361 t/01_run.t => FAILED!! 9ba29d8
And I hit this also on osd https://build.opensuse.org/project/monitor/home:cfconrad:branches:devel:languages:perl?arch_aarch64=1&defaults=0&failed=1&repo_openSUSE_Tumbleweed=1 https://build.opensuse.org/project/monitor/home:cfconrad:branches:devel:languages:perl?arch_x86_64=1&defaults=0&failed=1&repo_openSUSE_Leap_15_0=1
When running in such error with verbose=1
and only that single test:
cover: running ./Build test "--extra_compiler_flags=-O0 -fprofile-arcs -ftest-coverage" "--extra_linker_flags=-fprofile-arcs -ftest-coverage"
t/02_parallel.t ..
# Subtest: parallel
ok 1 - An object of class 'Mojo::IOLoop::ReadWriteProcess::Pool' isa 'Mojo::IOLoop::ReadWriteProcess::Pool'
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
1..9
ok 1 - parallel
# Subtest: batch
ok 1 - An object of class 'Mojo::IOLoop::ReadWriteProcess::Pool' isa 'Mojo::IOLoop::ReadWriteProcess::Pool'
ok 2
ok 3
ok 4
ok 5
ok 6
All 1 subtests passed
Test Summary Report
-------------------
t/02_parallel.t (Wstat: 139 Tests: 1 Failed: 0)
Non-zero wait status: 139
Parse errors: No plan found in TAP output
Files=1, Tests=1, 9 wallclock secs ( 0.02 usr 0.01 sys + 0.63 cusr 0.27 csys = 0.93 CPU)
Result: FAIL
Failed 1/1 test programs. 0/1 subtests failed.
Reading database from /home/clemix/sandbox/Mojo-IOLoop-ReadWriteProcess/cover_db
-----------------------------------------------------------------------------
File
-----------------------------------------------------------------------------
Total
-----------------------------------------------------------------------------
A other error discovered during the WE:
Deleting database /home/clemix/sandbox/Mojo-IOLoop-ReadWriteProcess_2/cover_db
cover: running ./Build test "--extra_compiler_flags=-O0 -fprofile-arcs -ftest-coverage" "--extra_linker_flags=-fprofile-arcs -ftest-coverage"
t/02_parallel_fast.t ..
# Subtest: parallel
ok 1 - 'Is RWP::Pool' isa 'Mojo::IOLoop::ReadWriteProcess::Pool'
ok 2 - Size is 4
Start
each read
ok 3 - getline() success!
ok 4 - getline() success!
ok 5 - getline() success!
ok 6 - getline() success!
ok 7 - Fired process is 4
DO restart
double free or corruption (out)
No subtests run
Test Summary Report
-------------------
t/02_parallel_fast.t (Wstat: 134 Tests: 0 Failed: 0)
Non-zero wait status: 134
Parse errors: No plan found in TAP output
Files=1, Tests=0, 2 wallclock secs ( 0.01 usr 0.01 sys + 0.55 cusr 0.14 csys = 0.71 CPU)
Result: FAIL
Failed 1/1 test programs. 0/0 subtests failed.
Reading database from /home/clemix/sandbox/Mojo-IOLoop-ReadWriteProcess_2/cover_db
I'm able to reproduce this (or almost this) locally now.
I guess we hit something like this https://github.com/cfconrad/Mojo-IOLoop-ReadWriteProcess/commit/1c27c4352e5a100987c22260539f2e7fbc6fa6ea , that our signal handler interrupt our code in unconditional state and s**t happens...
I was able to reproduce the error quite good with: https://github.com/cfconrad/Mojo-IOLoop-ReadWriteProcess/commit/d84d4eed29a25ea82183c90dbd8a44b7da992dbb I get the error always with less then 200 attempts and mostly around ~50.
But if I set emit_from_sigchld => 0
I do not hit the error within 25867 attempts. Maybe we should reconsider the default of that variable. (https://github.com/cfconrad/Mojo-IOLoop-ReadWriteProcess/commit/cdba6076f51835a21a5767159a1c69cf1d349f5e)
I didn't found the root cause of the error. But getting a double free or corruption (out)
sounds really like some interruption glitch here and there...
@Martchus would you like to verify my observations ? Would be nice, so I do not see ghosts!
So that it doesn't go under the table:
https://github.com/openSUSE/Mojo-IOLoop-ReadWriteProcess/runs/4472303650?check_suite_focus=true
Failed 1/15 test programs. 0/70 subtests failed. t/02_parallel.t (Wstat: 139 Tests: 1 Failed: 0) Non-zero wait status: 139 Parse errors: No plan found in TAP output Files=15, Tests=70, 256 wallclock secs ( 0.08 usr 0.04 sys + 135.23 cusr 10.99 csys = 146.34 CPU) Result: FAIL