nodejs / build

Better build and test infra for Node.
507 stars 165 forks source link

AIX 7.2 CI job is very unstable #2621

Closed targos closed 3 years ago

targos commented 3 years ago

See https://ci.nodejs.org/job/node-test-commit-aix/buildTimeTrend Fails more than 50% of the time

targos commented 3 years ago

Example: https://ci.nodejs.org/job/node-test-commit-aix/36221/#showFailuresLink

Many tests using child process fail with the EAGAIN error code.

targos commented 3 years ago

/cc @nodejs/platform-aix

richardlau commented 3 years ago

There's at least two issues here:

richardlau commented 3 years ago

I've logged into test-osuosl-aix72-ppc64_be-3 and while Jenkins believes that host is idle I can see a lot of running processes. e.g.

root@test-osuosl-aix72-ppc64_be-3:[/root]ps -ef | grep bash         
    iojs 11731220 15532534   0   Apr 12      -  0:00 /usr/bin/bash -xe /tmp/jenkins8012660964621729646.sh
    iojs 13304150 15532534   0   Apr 08      -  0:00 /usr/bin/bash -xe /tmp/jenkins1832409300680795143.sh
    iojs 14942628 15532534   0   Apr 12      -  0:00 /usr/bin/bash -xe /tmp/jenkins1431301206453705270.sh
    iojs 15859984 15532534   0   Apr 12      -  0:00 /usr/bin/bash -xe /tmp/jenkins7930803442644380686.sh
    iojs 16056804 15532534   0   Apr 12      -  0:00 /usr/bin/bash -xe /tmp/jenkins4968422098257556735.sh
    iojs 16974290 15532534   0   Apr 11      -  0:00 /usr/bin/bash -xe /tmp/jenkins5270902707046841306.sh
root@test-osuosl-aix72-ppc64_be-3:[/root]ps -ef | grep gmake        
    iojs  5964088 13631954   0   Apr 12      -  0:00 gmake
    iojs 11141604 15729004   0   Apr 08      -  0:00 gmake
    iojs 12779806 11731220   0   Apr 12      -  0:00 gmake run-ci -j 6 JOBS=6
    iojs 12976528 16974290   0   Apr 11      -  0:00 gmake run-ci -j 6 JOBS=6
    iojs 13173104  5964088   0   Apr 12      -  0:41 gmake -C out BUILDTYPE=Release V=0
    iojs 13631954 14942628   0   Apr 12      -  0:00 gmake run-ci -j 6 JOBS=6
    root 14025110 13435388   0 01:00:20  pts/0  0:00 grep gmake
    iojs 14090512 15139134   0   Apr 12      -  0:26 gmake -C out BUILDTYPE=Release V=0
    iojs 14549464 16318966   0   Apr 12      -  0:00 gmake
    iojs 15139134 12779806   0   Apr 12      -  0:00 gmake
    iojs 15729004 13304150   0   Apr 08      -  0:00 gmake run-ci -j 6 JOBS=6
    iojs 15925648 11141604   0   Apr 08      -  0:27 gmake -C out BUILDTYPE=Release V=0
    iojs 15991290 12976528   0   Apr 11      -  0:00 gmake
    iojs 16253262 15859984   0   Apr 12      -  0:00 gmake run-ci -j 6 JOBS=6
    iojs 16318966 16056804   0   Apr 12      -  0:00 gmake run-ci -j 6 JOBS=6
    iojs 16581100 16253262   0   Apr 12      -  0:00 gmake
    iojs 17432864 14549464   0   Apr 12      -  0:39 gmake -C out BUILDTYPE=Release V=0
    iojs 17563920 15991290   0   Apr 11      -  0:40 gmake -C out BUILDTYPE=Release V=0
    iojs 18088318 16581100   0   Apr 12      -  0:49 gmake -C out BUILDTYPE=Release V=0
root@test-osuosl-aix72-ppc64_be-3:[/root]ps -ef | grep gmake | wc -l
      18
root@test-osuosl-aix72-ppc64_be-3:[/root]ps -ef | grep g++ | wc -l
      24
root@test-osuosl-aix72-ppc64_be-3:[/root]ps -ef | grep gcc | wc -l
      12
root@test-osuosl-aix72-ppc64_be-3:[/root]
richardlau commented 3 years ago

I've killed all the iojs owned bash processes on test-osuosl-aix72-ppc64_be-3, which has terminated the child gmake, gcc/g++ processes as well. Restarted the Jenkins agent for good measure. I ran through the parallel and sequential tests using the current workspace and all of those tests passed:

iojs@test-osuosl-aix72-ppc64_be-3:[/home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64]./node --version
v16.0.0-pre
iojs@test-osuosl-aix72-ppc64_be-3:[/home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64]tools/test.py -J parallel
[02:14|% 100|+ 2758|-   0]: Done                                              
iojs@test-osuosl-aix72-ppc64_be-3:[/home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64]tools/test.py -J sequential
[01:50|% 100|+ 120|-   0]: Done                                               
iojs@test-osuosl-aix72-ppc64_be-3:[/home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64]

Started a CI build that's running on test-osuosl-aix72-ppc64_be-3: https://ci.nodejs.org/job/node-test-commit-aix/36226/nodes=aix72-ppc64/

richardlau commented 3 years ago

Started a CI build that's running on test-osuosl-aix72-ppc64_be-3: https://ci.nodejs.org/job/node-test-commit-aix/36226/nodes=aix72-ppc64/

That build and subsequent builds on test-osuosl-aix72-ppc64_be-3 have been passing 🎉: https://ci.nodejs.org/computer/test-osuosl-aix72-ppc64_be-3/builds

For test-osuosl-aix72-ppc64_be-2 the failing builds show:

Build timed out (after 10 minutes). Marking the build as failed.

The 10 minutes is something we've set in the job config: image

I've looked at the jobs for the other platforms and we seem to be using a range of timeout values, e.g. on LinuxONE we use 5 mins (300 seconds), on arm64 macOS an hour (3600 seconds) and on the x64 Linux job 2 hours (7200 seconds). I'm going to bump the timeout for the AIX job to an hour.

richardlau commented 3 years ago

Timeout has been increased to 1 hour: https://github.com/nodejs/jenkins-config-test/commit/1f025a50cebdc3d2075389f80b94128af256ba32

richardlau commented 3 years ago

https://ci.nodejs.org/job/node-test-commit-aix/nodes=aix72-ppc64/36241/ has passed on test-osuosl-aix72-ppc64_be-2. FWIW there was a 12 minute "no activity" period in the log that would have timed the build out with the previous 10 min timeout:

11:33:03   g++ -Wl,-bnoerrmsg -pthread -Wl,-bbigtoc -maix64 -Wl,-blibpath:/usr/lib:/lib:/opt/freeware/lib/pthread/ppc64 -Wl,-bE:/home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/mkcodecache.exp -Wl,-brtl -pthread  -o /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/mkcodecache /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/mkcodecache/src/node_snapshot_stub.o /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/mkcodecache/src/node_code_cache_stub.o /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/mkcodecache/tools/code_cache/mkcodecache.o /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/mkcodecache/tools/code_cache/cache_builder.o /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/libnode.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/histogram/libhistogram.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/uvwasi/libuvwasi.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/v8_gypfiles/libv8_snapshot.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/v8_gypfiles/libv8_libplatform.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/icu/libicui18n.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/zlib/libzlib.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/llhttp/libllhttp.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/cares/libcares.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/uv/libuv.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/nghttp2/libnghttp2.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/brotli/libbrotli.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/openssl/libopenssl.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/ngtcp2/libngtcp2.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/ngtcp2/libnghttp3.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/icu/libicuucx.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/icu/libicudata.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/v8_gypfiles/libv8_base_without_compiler.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/v8_gypfiles/libv8_libbase.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/v8_gypfiles/libv8_libsampler.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/v8_gypfiles/libv8_zlib.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/v8_gypfiles/libv8_compiler.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/v8_gypfiles/libv8_initializers.a -lm -lperfstat -ldl -lrt
11:45:41   g++ -Wl,-bnoerrmsg -pthread -Wl,-bbigtoc -maix64 -Wl,-blibpath:/usr/lib:/lib:/opt/freeware/lib/pthread/ppc64 -Wl,-bE:/home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/node_mksnapshot.exp -Wl,-brtl -pthread  -o /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/node_mksnapshot /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/node_mksnapshot/src/node_snapshot_stub.o /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/node_mksnapshot/src/node_code_cache_stub.o /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/node_mksnapshot/tools/snapshot/node_mksnapshot.o /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/node_mksnapshot/tools/snapshot/snapshot_builder.o /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/libnode.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/histogram/libhistogram.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/uvwasi/libuvwasi.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/v8_gypfiles/libv8_snapshot.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/v8_gypfiles/libv8_libplatform.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/icu/libicui18n.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/zlib/libzlib.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/llhttp/libllhttp.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/cares/libcares.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/uv/libuv.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/nghttp2/libnghttp2.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/brotli/libbrotli.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/openssl/libopenssl.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/ngtcp2/libngtcp2.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/deps/ngtcp2/libnghttp3.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/icu/libicuucx.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/icu/libicudata.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/v8_gypfiles/libv8_base_without_compiler.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/v8_gypfiles/libv8_libbase.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/v8_gypfiles/libv8_libsampler.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/v8_gypfiles/libv8_zlib.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/v8_gypfiles/libv8_compiler.a /home/iojs/build/workspace/node-test-commit-aix/nodes/aix72-ppc64/out/Release/obj.target/tools/v8_gypfiles/libv8_initializers.a -lm -lperfstat -ldl -lrt
richardlau commented 3 years ago

The 21 most recent AIX 7.2 builds have all passed, so I think we can mark this as resolved. Feel free to reopen if symptoms reappear.

targos commented 3 years ago

Thanks!