nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.63k stars 29.07k forks source link

Investigate flaky test-watch-mode (no. 3) #44898

Open tniessen opened 1 year ago

tniessen commented 1 year ago

Test

test-watch-mode

Platform

Linux x64, FreeBSD

Console output

13:17:49 not ok 3811 sequential/test-watch-mode
13:17:55   ---
13:17:55   duration_ms: 5.291
13:17:55   severity: fail
13:17:55   exitcode: 1
13:17:55   stack: |-
13:17:55     TAP version 13
13:17:55     # Subtest: watch mode
13:17:55         # Subtest: should watch changes to a file - event loop ended
13:17:55         ok 1 - should watch changes to a file - event loop ended
13:17:55           ---
13:17:55           duration_ms: 2739.793821
13:17:55           ...
13:17:55         # Subtest: should watch changes to a failing file
13:17:55         ok 2 - should watch changes to a failing file
13:17:55           ---
13:17:55           duration_ms: 2477.028169
13:17:55           ...
13:17:55         # Subtest: should not watch when running an non-existing file
13:17:55         ok 3 - should not watch when running an non-existing file
13:17:55           ---
13:17:55           duration_ms: 2285.642733
13:17:55           ...
13:17:55         # Subtest: should watch when running an non-existing file - when specified under --watch-path
13:17:55         ok 4 - should watch when running an non-existing file - when specified under --watch-path # SKIP
13:17:55           ---
13:17:55           duration_ms: 209.591623
13:17:55           ...
13:17:55         # Subtest: should watch changes to a file - event loop blocked
13:17:55         not ok 5 - should watch changes to a file - event loop blocked
13:17:55           ---
13:17:55           duration_ms: 2870.03467
13:17:55           failureType: 'testCodeFailure'
13:17:55           error: |-
13:17:55             Expected values to be strictly deep-equal:
13:17:55             + actual - expected
13:17:55             
13:17:55               [
13:17:55             +   'running',
13:17:55             +   "Restarting '/home/iojs/build/workspace/node-test-commit-linux-containered/test/fixtures/watch-mode/event_loop_blocked.js'"
13:17:55             -   "Restarting '/home/iojs/build/workspace/node-test-commit-linux-containered/test/fixtures/watch-mode/event_loop_blocked.js'",
13:17:55             -   'running'
13:17:55               ]
13:17:55           code: 'ERR_ASSERTION'
13:17:55           stack: |-
13:17:55             assertRestartedCorrectly (file:///home/iojs/build/workspace/node-test-commit-linux-containered/test/sequential/test-watch-mode.mjs:74:10)
13:17:55             Object.<anonymous> (file:///home/iojs/build/workspace/node-test-commit-linux-containered/test/sequential/test-watch-mode.mjs:141:5)
13:17:55             process.processTicksAndRejections (node:internal/process/task_queues:95:5)
13:17:55             async ItTest.run (node:internal/test_runner/test:512:9)
13:17:55             async Promise.all (index 4)
13:17:55             async Suite.run (node:internal/test_runner/test:739:7)
13:17:55           ...
13:17:55         # Subtest: should watch changes to dependencies - cjs
13:17:55         ok 6 - should watch changes to dependencies - cjs
13:17:55           ---
13:17:55           duration_ms: 2782.278082
13:17:55           ...
13:17:55         # Subtest: should watch changes to dependencies - esm
13:17:55         ok 7 - should watch changes to dependencies - esm
13:17:55           ---
13:17:55           duration_ms: 2598.258289
13:17:55           ...
13:17:55         # Subtest: should restart multiple times
13:17:55         ok 8 - should restart multiple times
13:17:55           ---
13:17:55           duration_ms: 4515.95009
13:17:55           ...
13:17:55         # Subtest: should pass arguments to file
13:17:55         ok 9 - should pass arguments to file
13:17:55           ---
13:17:55           duration_ms: 2784.227409
13:17:55           ...
13:17:55         # Subtest: should not load --require modules in main process
13:17:55         ok 10 - should not load --require modules in main process
13:17:55           ---
13:17:55           duration_ms: 2822.566444
13:17:55           ...
13:17:55         # Subtest: should not load --import modules in main process
13:17:55         ok 11 - should not load --import modules in main process
13:17:55           ---
13:17:55           duration_ms: 2968.502896
13:17:55           ...
13:17:55         1..11
13:17:55     not ok 1 - watch mode
13:17:55       ---
13:17:55       duration_ms: 4638.010352
13:17:55       failureType: 'subtestsFailed'
13:17:55       error: '1 subtest failed'
13:17:55       code: 'ERR_TEST_FAILURE'
13:17:55       ...
13:17:55     1..1
13:17:55     # tests 1
13:17:55     # pass 0
13:17:55     # fail 1
13:17:55     # cancelled 0
13:17:55     # skipped 0
13:17:55     # todo 0
13:17:55     # duration_ms 4649.416376
13:17:55   ...

Build links

Additional information

Refs: https://github.com/nodejs/node/issues/44655 Refs: https://github.com/nodejs/node/issues/44735 Refs: https://github.com/nodejs/node/issues/44805

tniessen commented 1 year ago

Similar output:

16:12:03 not ok 3793 sequential/test-watch-mode
16:12:06   ---
16:12:06   duration_ms: 3.677
16:12:06   severity: fail
16:12:06   exitcode: 1
16:12:06   stack: |-
16:12:06     TAP version 13
16:12:06     # Subtest: watch mode
16:12:06         # Subtest: should watch changes to a file - event loop ended
16:12:06         ok 1 - should watch changes to a file - event loop ended
16:12:06           ---
16:12:06           duration_ms: 1657.422205
16:12:06           ...
16:12:06         # Subtest: should watch changes to a failing file
16:12:06         ok 2 - should watch changes to a failing file
16:12:06           ---
16:12:06           duration_ms: 2144.675239
16:12:06           ...
16:12:06         # Subtest: should not watch when running an non-existing file
16:12:06         ok 3 - should not watch when running an non-existing file
16:12:06           ---
16:12:06           duration_ms: 848.075766
16:12:06           ...
16:12:06         # Subtest: should watch when running an non-existing file - when specified under --watch-path
16:12:06         ok 4 - should watch when running an non-existing file - when specified under --watch-path # SKIP
16:12:06           ---
16:12:06           duration_ms: 281.828198
16:12:06           ...
16:12:06         # Subtest: should watch changes to a file - event loop blocked
16:12:06         ok 5 - should watch changes to a file - event loop blocked
16:12:06           ---
16:12:06           duration_ms: 1094.820609
16:12:06           ...
16:12:06         # Subtest: should watch changes to dependencies - cjs
16:12:06         ok 6 - should watch changes to dependencies - cjs
16:12:06           ---
16:12:06           duration_ms: 2049.295256
16:12:06           ...
16:12:06         # Subtest: should watch changes to dependencies - esm
16:12:06         ok 7 - should watch changes to dependencies - esm
16:12:06           ---
16:12:06           duration_ms: 2216.116407
16:12:06           ...
16:12:06         # Subtest: should restart multiple times
16:12:06         ok 8 - should restart multiple times
16:12:06           ---
16:12:06           duration_ms: 3098.411375
16:12:06           ...
16:12:06         # Subtest: should pass arguments to file
16:12:06         ok 9 - should pass arguments to file
16:12:06           ---
16:12:06           duration_ms: 1908.309201
16:12:06           ...
16:12:06         # Subtest: should not load --require modules in main process
16:12:06         ok 10 - should not load --require modules in main process
16:12:06           ---
16:12:06           duration_ms: 1360.515837
16:12:06           ...
16:12:06         # Subtest: should not load --import modules in main process
16:12:06         not ok 11 - should not load --import modules in main process
16:12:06           ---
16:12:06           duration_ms: 2044.114142
16:12:06           failureType: 'testCodeFailure'
16:12:06           error: |-
16:12:06             Expected values to be strictly deep-equal:
16:12:06             + actual - expected
16:12:06             
16:12:06               [
16:12:06             +   "Completed running '/usr/home/iojs/node-tmp/.tmp.3792/3.js'",
16:12:06             +   "Restarting '/usr/home/iojs/node-tmp/.tmp.3792/3.js'"
16:12:06             -   "Restarting '/usr/home/iojs/node-tmp/.tmp.3792/3.js'",
16:12:06             -   "Completed running '/usr/home/iojs/node-tmp/.tmp.3792/3.js'"
16:12:06               ]
16:12:06           code: 'ERR_ASSERTION'
16:12:06           stack: |-
16:12:06             assertRestartedCorrectly (file:///usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd12-x64/test/sequential/test-watch-mode.mjs:74:10)
16:12:06             Object.<anonymous> (file:///usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd12-x64/test/sequential/test-watch-mode.mjs:218:5)
16:12:06             process.processTicksAndRejections (node:internal/process/task_queues:95:5)
16:12:06             async ItTest.run (node:internal/test_runner/test:512:9)
16:12:06             async Promise.all (index 10)
16:12:06             async Suite.run (node:internal/test_runner/test:739:7)
16:12:06           ...
16:12:06         1..11
16:12:06     not ok 1 - watch mode
16:12:06       ---
16:12:06       duration_ms: 3336.446593
16:12:06       failureType: 'subtestsFailed'
16:12:06       error: '1 subtest failed'
16:12:06       code: 'ERR_TEST_FAILURE'
16:12:06       ...
16:12:06     1..1
16:12:06     # tests 1
16:12:06     # pass 0
16:12:06     # fail 1
16:12:06     # cancelled 0
16:12:06     # skipped 0
16:12:06     # todo 0
16:12:06     # duration_ms 3344.389888
16:12:06   ...

@MoLow Looks like the order within some arrays is non-deterministic. Not sure if that's a bug in the implementation or in the test.

MoLow commented 1 year ago

On vacation, will have to wait unless someone else wants to give it a look

pbo-linaro commented 1 year ago

This issue was observed on Windows on Arm platform too.

@tniessen @MoLow @richardlau Would you accept to mark those tests as flaky, waiting for a proper fix?

kapouer commented 1 year ago

That test is failing in a variety of ways on our debian builds too (https://buildd.debian.org/status/package.php?p=nodejs&suite=experimental)

pbo-linaro commented 1 year ago

Created a PR to mark those test-watch-mode as flaky, so it can unlock the current situation.

kapouer commented 1 year ago

IMO they're more than flaky, several times they just hang for a long time.

pbo-linaro commented 1 year ago

Is there a timeout of stuff like this we can set for nodejs tests? Or deletion is the only alternative?

kapouer commented 1 year ago

In theory there is already a timeout for tests, but in the case of that test it didn't work (as usual, take what I say with precautions because it's with the debian build of nodejs) and the whole test suite just hang.

pbo-linaro commented 1 year ago

@richardlau @MoLow Should we delete those tests for now, as they seem to create some difficult problems? I know it's not ideal, but hanging a testsuite and report random failures is really a huge difficulty.

MoLow commented 1 year ago

the timeouts happen in the test-watch-mode-inspect tests, and flakyness in test-watch-mode, I am planning to take a deeper look but I am with a temporary computer for a while - I think for the time it is ok to mark as flaky without deleting

tniessen commented 1 year ago

Another occurrence on macOS:

=== release test-watch-mode-files_watcher ===
Path: parallel/test-watch-mode-files_watcher
Error: --- stdout ---
TAP version 13
# Subtest: watch mode file watcher
    # Subtest: should watch changed files
    ok 1 - should watch changed files
      ---
      duration_ms: 20.78446
      ...
    # Subtest: should throttle changes
    ok 2 - should throttle changes
      ---
      duration_ms: 230.579784
      ...
    # Subtest: should ignore files in watched directory if they are not filtered
    ok 3 - should ignore files in watched directory if they are not filtered
      ---
      duration_ms: 1004.011198
      ...
    # Subtest: should allow clearing filters
    ok 4 - should allow clearing filters
      ---
      duration_ms: 1219.288977
      ...
Command: out/Release/node --expose-internals /Users/runner/work/node/node/test/parallel/test-watch-mode-files_watcher.mjs
--- TIMEOUT ---

===
=== 1 tests failed
===
StefanStojanovic commented 1 year ago

@tniessen @MoLow I've opened 2 PRs fixing both test-watch-mode-inspect and test-watch-mode. As it turned out the issue was not syncing process execution and its restart. However, even after fixing that I got rare repros of logs between starting node in watch mode and the first triggered restart missing (more details on that in the PR descriptions). Did you encounter something similar before and do you have some ideas on why that could happen?

jasnell commented 1 year ago

I'm assuming this is fixed since #45585 landed?

Please re-open if that proves not to be the case.

anonrig commented 1 year ago

This test is inside the sequential.status file and marked as PASS, FLAKY but today one of the CI builds failed because of this. Do we have a bug?

Build: https://ci.nodejs.org/job/node-test-binary-armv7l/5443/

targos commented 1 year ago

I'm not sure why this CI run failed, but it's not because of the test timeout.

richardlau commented 1 year ago

It failed because of leftover Node.js processes at the end of the test run:

16:59:49 All tests passed.
16:59:49 ps awwx | grep Release/node | grep -v grep | cat
16:59:49  643411 ?        Sl     0:00 /home/iojs/build/workspace/node-test-binary-armv7l/out/Release/node --inspect=0 --watch /home/iojs/build/workspace/node-test-binary-armv7l/test/fixtures/watch-mode/inspect.js
16:59:49  643417 ?        Sl     0:00 /home/iojs/build/workspace/node-test-binary-armv7l/out/Release/node --inspect=0 /home/iojs/build/workspace/node-test-binary-armv7l/test/fixtures/watch-mode/inspect.js
16:59:49 make: *** [Makefile:538: test-ci-js] Error 1

ps awwx | grep Release/node | grep -v grep | cat is expected to not return any processes. https://github.com/nodejs/node/blob/5fa84e88661ac505ade404551bb5594e2201926e/Makefile#L538-L543

tniessen commented 1 year ago

Another failure of test-watch-mode-files_watcher:

01:20:12 not ok 3455 parallel/test-watch-mode-files_watcher
01:20:12   ---
01:20:12   duration_ms: 120068.19300
01:20:12   severity: fail
01:20:12   exitcode: -15
01:20:12   stack: |-
01:20:12     timeout
01:20:12     TAP version 13
01:20:12     # Subtest: watch mode file watcher
01:20:12         # Subtest: should watch changed files
01:20:12         ok 1 - should watch changed files
01:20:12           ---
01:20:12           duration_ms: 179.681248
01:20:12           ...
01:20:12   ...
juanarbol commented 1 month ago

This happened again in https://ci.nodejs.org/job/node-test-commit-smartos/55674/nodes=smartos20-64/

Re-opening this.