nodejs / reliability

Make the CI green again
33 stars 9 forks source link

Potential flakes: 20180813 #12

Closed joyeecheung closed 5 years ago

joyeecheung commented 5 years ago

Produced with ncu-ci walk pr --stats, counting tests that failed more than 1 PR

Potential flaky tests

https://github.com/nodejs/node/issues/22317

Reason     addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max-by-1-binary
Type       JS_TEST_FAILURE
Failed PR  17 (#22211, #21571, #22020, #22274, #21559, #21840, #22022, #22291, #22289, #22286, #22285, #22284, #22084, #22266, #22295, #22074, #22106)
Appeared   test-digitalocean-freebsd11-x64-1, test-digitalocean-freebsd11-x64-2
First CI   https://ci.nodejs.org/job/node-test-pull-request/16386/
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16422/

https://github.com/nodejs/node/issues/15985

Reason     async-hooks/test-callback-error
Type       JS_TEST_FAILURE
Failed PR  6 (#22266, #22211, #22274, #22289, #22284, #22293)
Appeared   test-digitalocean-fedora27-x64-1
First CI   https://ci.nodejs.org/job/node-test-pull-request/16383/
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16413/

https://github.com/nodejs/node/issues/21425

Reason     async-hooks/test-statwatcher
Type       JS_TEST_FAILURE
Failed PR  6 (#21975, #22242, #22274, #22290, #22288, #22293)
Appeared   test-azure_msft-win2016-x64-4, test-joyent-ubuntu1604_sharedlibs_container-x64-1, test-softlayer-ubuntu1604_sharedlibs_container-x64-4, test-digitalocean-ubuntu1604_sharedlibs_container-x64-10
First CI   https://ci.nodejs.org/job/node-test-pull-request/16326/
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16413/

???

Reason     parallel/test-gc-http-client
Type       JS_TEST_FAILURE
Failed PR  4 (#22291, #22289, #22286, #22074)
Appeared   test-digitalocean-freebsd11-x64-1, test-digitalocean-ubuntu1604_sharedlibs_container-x64-9, test-digitalocean-ubuntu1604_sharedlibs_container-x64-8, test-digitalocean-ubuntu1604_sharedlibs_container-x64-7
First CI   https://ci.nodejs.org/job/node-test-pull-request/16401/
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16418/
--------------------------------------------------------------------------------
Reason     parallel/test-cli-node-print-help
Type       JS_TEST_FAILURE
Failed PR  2 (#22104, #22271)
Appeared   test-digitalocean-ubuntu1604_sharedlibs_container-x64-10
First CI   https://ci.nodejs.org/job/node-test-pull-request/16332/
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16371/
--------------------------------------------------------------------------------
Reason     parallel/test-trace-events-fs-sync
Type       JS_TEST_FAILURE
Failed PR  2 (#22128, #22243)
Appeared   test-linuxonecc-rhel72-s390x-3
First CI   https://ci.nodejs.org/job/node-test-pull-request/16340/
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16349/
--------------------------------------------------------------------------------
Reason     parallel/test-worker-memory
Type       JS_TEST_FAILURE
Failed PR  2 (#22222, #22074)
Appeared   test-digitalocean-ubuntu1604_sharedlibs_container-x64-5, test-packetnet-ubuntu1604-arm64-2
First CI   https://ci.nodejs.org/job/node-test-pull-request/16350/
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16418/

Build issues:

git EOL autofix issues, see https://github.com/nodejs/reliability/issues/12#issuecomment-412683325

https://github.com/nodejs/build/issues/1443

Reason     Cannot rebase: You have unstaged changes.
Type       BUILD_FAILURE
Failed PR  9 (#22020, #22267, #22242, #22022, #22266, #22074, #22294, #22232, #22106)
Appeared   test-packetnet-centos7-arm64-2, test-packetnet-centos7-arm64-1, test-digitalocean-ubuntu1604_sharedlibs_container-x64-7, test-digitalocean-ubuntu1604_sharedlibs_container-x64-10
First CI   https://ci.nodejs.org/job/node-test-pull-request/16374/
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16422/

libicui18n.a compilation errors, e.g. https://ci.nodejs.org/job/node-test-commit-smartos/nodes=smartos17-64/19373/console should be fixed already

Reason     ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?
Type       BUILD_FAILURE
Failed PR  8 (#22207, #22152, #22241, #22242, #22239, #22186, #22274, #21571)
Appeared   test-joyent-smartos16-x64-2, test-joyent-smartos17-x64-2, test-osuosl-aix61-ppc64_be-1, test-joyent-smartos17-x64-1
First CI   https://ci.nodejs.org/job/node-test-pull-request/16333/
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16389/

Errors during compiling libopenssl.a e.g. https://ci.nodejs.org/job/node-test-commit-custom-suites-freestyle/15/console should be fixed already

Reason     Makefile:87: recipe for target 'node' failed
Type       BUILD_FAILURE
Failed PR  6 (#22104, #22152, #22209, #21684, #22242, #22239)
Appeared   , test-packetnet-ubuntu1604-x64-2
First CI   https://ci.nodejs.org/job/node-test-pull-request/16332/
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16344/

Jenkins script errors, should be fixed

Reason     bash: line 1: 502:: command not found
Type       BUILD_FAILURE
Failed PR  5 (#21906, #22074, #22211, #22274, #21571)
Appeared   test-digitalocean-ubuntu1604-x86-2, test-packetnet-centos7-arm64-1, test-packetnet-ubuntu1604-arm64-2, test-softlayer-alpine36_container-x64-1, test-digitalocean-alpine37_container-x64-2, test-softlayer-centos7-x64-1, test-rackspace-fedora27-x64-1, test-softlayer-ubuntu1404-x86-1, test-softlayer-ubuntu1404-x64-1, test-rackspace-ubuntu1604-x64-1, test-joyent-ubuntu1710-x64-2, test-digitalocean-ubuntu1804_container-x64-2, test-macstadium-macos10.11-x64-2, test-osuosl-ubuntu1404-ppc64_le-1, test-scaleway-ubuntu1604-armv7l-3, , test-rackspace-centos7-x64-1, test-softlayer-ubuntu1804_container-x64-1, test-scaleway-ubuntu1604-armv7l-1
First CI   https://ci.nodejs.org/job/node-test-pull-request/16361/
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16389/
--------------------------------------------------------------------------------
Reason     bash: line 2: syntax error near unexpected token `newline'
Type       BUILD_FAILURE
Failed PR  3 (#22242, #21571, #22206)
Appeared   test-rackspace-ubuntu1604-x64-2, test-softlayer-debian9-x64-1, test-digitalocean-fedora26-x64-1, test-joyent-freebsd10-x64-2, test-rackspace-ubuntu1604-x64-1
First CI   https://ci.nodejs.org/job/node-test-pull-request/16380/
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16390/

???


--------------------------------------------------------------------------------
Reason     Failed to trigger node-test-commit
Type       BUILD_FAILURE
Failed PR  2 (#21975, #22274)
Appeared   test-softlayer-ubuntu1604-x64-1
First CI   https://ci.nodejs.org/job/node-test-pull-request/16325/
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16392/
--------------------------------------------------------------------------------
Reason     error: Resource temporarily unavailable
Type       BUILD_FAILURE
Failed PR  2 (#22106, #22212)
Appeared   test-packetnet-ubuntu1604-x64-2
First CI   https://ci.nodejs.org/job/node-test-pull-request/16334/
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16337/
--------------------------------------------------------------------------------
Reason     error: tool 'xcodebuild' requires Xcode, but active developer directory '/Library/Developer/CommandLineTools' is a command line tools instance
Type       BUILD_FAILURE
Failed PR  2 (#19780, #22211)
Appeared   test-macstadium-macos10.11-x64-1, test-requireio-osx1010-x64-1
First CI   https://ci.nodejs.org/job/node-test-pull-request/16372/
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16386/
--------------------------------------------------------------------------------
Reason     fatal: Needed a single revision
Type       BUILD_FAILURE
Failed PR  2 (#22078, #22128)
Appeared
First CI   https://ci.nodejs.org/job/node-test-pull-request/16339/
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16340/
refack commented 5 years ago

Renamed issue, since we're still in 2018-08 (AFAICT, I did take a nap, both I don't feel like it was that long 😴)

refack commented 5 years ago

RE: addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max-by-1-binary

[iojs@test-digitalocean-freebsd11-x64-2 /usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64]$ ./node -p "require('os').totalmem()"
2107195392

So is seems like node knows how to check memory size, and those machines have 2GB which is more than the cutoff > 0x70000000; /* 1.75 Gb */ . . . Just before hitting submit I run this:

[iojs@test-digitalocean-freebsd11-x64-2 /usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64]$ ./node -p "require('os').freemem()"
19468288

So that might be a problem.

Trott commented 5 years ago

So is seems like node knows how to check memory size, and those machines have 2GB which is more than the cutoff > 0x70000000; /* 1.75 Gb */

That cut-off was arrived at through experimentation and is a little bit like when we come across seemingly-arbitrary values in setTimeout().

If the test didn't require an addon (and it didn't used to, but we switched to an addon for reliability, I think?), then maybe it could be moved to pummel. Is it possible to move this test and the other stringbytes tests to pummel somehow? That should also please the "our tests take too long to run" folks because...these tests can take a while to run on some platforms.

refack commented 5 years ago

AFAICT it not supposed to be a pummel test. It's there to assert kStringMaxLength behaviour. We could move it to v8-updates tough...

Trott commented 5 years ago

AFAICT it not supposed to be a pummel test.

I figured pummel is for tests that require a lot of resources but I see there's a bit of a more specific description in the test README. So if v8-updates seems a better place for it, that's fine by me too, of course.

joyeecheung commented 5 years ago

The problem with addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max-by-1-binary is that it crashed on digital ocean freebsd11 nodes with signal 9, even though if the memory is not enough it should've thrown proper errors - also the line maxString = undefined there does not work as a GC hint, @refack discovered that calling the v8 gc helper explicitly could make the crash go away (but then even if GC does not work it should not have crashed).

refack commented 5 years ago

gc trace when failing: https://www.irccloud.com/pastebin/FI3zgCwR/

gc trace when passing: https://www.irccloud.com/pastebin/059SqSBt/

gc trace with explicit global.gc() calls: https://www.irccloud.com/pastebin/zMF1YpIa/

joyeecheung commented 5 years ago

The Cannot rebase: You have unstaged changes. seems to be caused by EOL autofixes in git gone wrong. deps/v8/third_party/jinja2/LICENSE uses CRLF and looks like git clean -fdx run by the git plugin in jenkins caused inconsistency in the working directory.

refack commented 5 years ago

and looks like git clean -fdx run by the git plugin in jenkins caused inconsistency in the working directory.

It's more like the default git behaviour (in conjunction with recursive application of .gitattributes) is to mark those files as Modified so that they will be committed with LF line endings.

Best solution I've found is to set the CI worker to:

git config --global --bool core.autocrlf false
git config --global --bool core.safecrlf false

Which makes sense, in that CI workers should not change files in any way

rvagg commented 5 years ago

FYI re FreeBSD 11, we upgraded from 11.0 to 11.2 which may explain the difference in system inspection

joyeecheung commented 5 years ago

BTW, looks like jinja2 is only committed by us, it's in v8's DEPS file but they don't commit that directory (it's probably fetched by gclient)

joaocgreis commented 5 years ago

@refack @joyeecheung autocrlf should not be changed in the Windows machines, we want to ensure new contributors can compile with a default installation of Git (we already ask for the Unix tools, but the less of those little details the better and some people might not want to change it). If possible, please use https://github.com/nodejs/node/blob/master/.gitattributes to specify that that file is always CRLF.