nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
105.49k stars 28.6k forks source link

CI test runner timeout on some machines #20652

Closed mscdex closed 6 years ago

mscdex commented 6 years ago

https://ci.nodejs.org/job/node-test-commit-linux/18629/nodes=ubuntu1710-x64/console

Something strange happened here and the output seems to have gotten mangled somehow. For example, well before the last test output, there is this way earlier in the log:

06:39:22 ok 1357 paNotifying upstream projects of job completion
06:39:22 rallel/test-process-kill-pid
06:39:22   ---
06:39:22   duration_ms: 0.213
06:39:22   ...

It seems not all the tests ran and something possibly killed the test runner early? However it's not clear why or how.

Perhaps related to https://github.com/nodejs/node/issues/20651 and https://github.com/nodejs/node/issues/20650 ?

devsnek commented 6 years ago

the message at the top says it went for over 10 minutes and that's considered a failure

mscdex commented 6 years ago

I see that now. I wonder why it buffered all the test output? I would have expected that kind of (timeout message) information to be down towards the bottom. It's also strange that it took 10 minutes, whereas the same tests successfully ran on other machines.

BridgeAR commented 6 years ago

@nodejs/build PTAL

Trott commented 6 years ago

This appears to have been resolved, although I'm not sure if it's because someone did something or because it self-resolved.

devsnek commented 6 years ago

@Trott the linux job is still consistently failing with these timeouts

Trott commented 6 years ago

@devsnek Can you point me to a recent one? I'm seeing different failures, or at least I think they're different failures.

devsnek commented 6 years ago

https://ci.nodejs.org/job/node-test-commit-linux/18653/nodes=ubuntu1604-32/consoleFull https://ci.nodejs.org/job/node-test-commit-linux/nodes=debian9-64/18653/consoleFull https://ci.nodejs.org/job/node-test-commit-linux/18652/nodes=ubuntu1604-32/consoleFull (list continues)

all print Build timed out (after 20 minutes). Marking the build as failed., run a successful TAP, and then exit as failed.

they all also show signs of garbled text in the console output:

19:31:19 ok 96 parallel/test-beNot all test cases were executed according to the test set plan. Marking build as UNSTABLE
Trott commented 6 years ago

@devsnek Those are all nodes I've taken offline (because they are failing consistently while other nodes in Jenkins that run the same tasks are succeeding) so tests should be passing again. But yeah, someone will need to find the source of that 20 minute timeout.

Trott commented 6 years ago

Relevant build issue: https://github.com/nodejs/build/issues/1267

maclover7 commented 6 years ago

This seems to have largely stopped -- going to close this out for now. The Build team has been doing a lot of work on our machine and their provisioning scripts, so hopefully bugs like this occur less frequently