nodejs / node

Node.js JavaScript runtime โœจ๐Ÿข๐Ÿš€โœจ
https://nodejs.org
Other
106.56k stars 29.05k forks source link

`test-async-context-frame` is flaky #54808

Open anonrig opened 1 week ago

anonrig commented 1 week ago

Test

test-async-context-frame

Platform

Windows

Console output

duration_ms: 6718.227
exitcode: 1
severity: fail
stack: "\u25B6 AsyncContextFrame\n  \u2714 async-hooks\\test-async-local-storage-args.js\
  \ (4955.4808ms)\n  \u2714 async-hooks\\test-async-local-storage-async-await.js (4400.0139ms)\n\
  \  \u2714 async-hooks\\test-async-local-storage-async-functions.js (5106.203ms)\n\
  \  \u2714 async-hooks\\test-async-local-storage-dgram.js (4764.8517ms)\n  \u2714\
  \ async-hooks\\test-async-local-storage-enable-disable.js (5294.6636ms)\n  \u2714\
  \ async-hooks\\test-async-local-storage-enter-with.js (4803.8479ms)\n  \u2714 async-hooks\\\
  test-async-local-storage-errors.js (5115.4623ms)\n  \u2714 async-hooks\\test-async-local-storage-gcable.js\
  \ (4954.9835ms)\n  \u2714 async-hooks\\test-async-local-storage-http-agent.js (5161.0138ms)\n\
  \  \u2714 async-hooks\\test-async-local-storage-http.js (4577.1697ms)\n  \u2714\
  \ async-hooks\\test-async-local-storage-misc-stores.js (5124.7882ms)\n  \u2714 async-hooks\\\
  test-async-local-storage-nested.js (5031.4255ms)\n  \u2714 async-hooks\\test-async-local-storage-no-mix-contexts.js\
  \ (4960.6226ms)\n  \u2714 async-hooks\\test-async-local-storage-promises.js (5029.5982ms)\n\
  \  \u2716 async-hooks\\test-async-local-storage-socket.js (5107.2542ms)\n    AssertionError\
  \ [ERR_ASSERTION]: Test async-hooks\\test-async-local-storage-socket.js failed with\
  \ exit code 3221225477\n        at TestContext.<anonymous> (file:///C:/workspace/node-test-binary-windows-js-suites/node/test/parallel/test-async-context-frame.mjs:58:7)\n\
  \        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)\n\
  \        at async Test.run (node:internal/test_runner/test:888:9)\n        at async\
  \ Promise.all (index 14)\n        at async Suite.run (node:internal/test_runner/test:1268:7)\n\
  \        at async startSubtestAfterBootstrap (node:internal/test_runner/harness:280:3)\
  \ {\n      generatedMessage: false,\n      code: 'ERR_ASSERTION',\n      actual:\
  \ 3221225477,\n      expected: 0,\n      operator: 'strictEqual'\n    }\n\n  \u2714\
  \ async-hooks\\test-async-local-storage-thenable.js (4896.5713ms)\n  \u2714 async-hooks\\\
  test-async-local-storage-tlssocket.js (5602.1337ms)\n  \u2714 parallel\\test-async-local-storage-bind.js\
  \ (4868.4376ms)\n  \u2714 parallel\\test-async-local-storage-contexts.js (4311.7402ms)\n\
  \  \u2714 parallel\\test-async-local-storage-deep-stack.js (4832.0565ms)\n  \u2714\
  \ parallel\\test-async-local-storage-exit-does-not-leak.js (4463.4821ms)\n  \u2714\
  \ parallel\\test-async-local-storage-http-multiclients.js (4983.6606ms)\n  \u2714\
  \ parallel\\test-async-local-storage-snapshot.js (4266.4503ms)\n\u25B6 AsyncContextFrame\
  \ (6170.0028ms)\n\u2139 tests 23\n\u2139 suites 1\n\u2139 pass 22\n\u2139 fail 1\n\
  \u2139 cancelled 0\n\u2139 skipped 0\n\u2139 todo 0\n\u2139 duration_ms 6186.8915\n\
  \n\u2716 failing tests:\n\ntest at test\\parallel\\test-async-context-frame.mjs:48:5\n\
  \u2716 async-hooks\\test-async-local-storage-socket.js (5107.2542ms)\n  AssertionError\
  \ [ERR_ASSERTION]: Test async-hooks\\test-async-local-storage-socket.js failed with\
  \ exit code 3221225477\n      at TestContext.<anonymous> (file:///C:/workspace/node-test-binary-windows-js-suites/node/test/parallel/test-async-context-frame.mjs:58:7)\n\
  \      at process.processTicksAndRejections (node:internal/process/task_queues:105:5)\n\
  \      at async Test.run (node:internal/test_runner/test:888:9)\n      at async\
  \ Promise.all (index 14)\n      at async Suite.run (node:internal/test_runner/test:1268:7)\n\
  \      at async startSubtestAfterBootstrap (node:internal/test_runner/harness:280:3)\
  \ {\n    generatedMessage: false,\n    code: 'ERR_ASSERTION',\n    actual: 3221225477,\n\
  \    expected: 0,\n    operator: 'strictEqual'\n  }"
...

Build links

Additional information

No response

jakecastelli commented 1 week ago

I have raised this one in openjs flaky ci slack channel and @Qard has responded positively ๐Ÿ™ I think he would take a look soon.

jasnell commented 1 week ago

Yeah, it appears to be a segfault (access violation) that appears to only happen on windows as far as I can determine. Will need a windows machine to investigate further. I was planning to dig in more next week on this one as I need to break out the Windows machine and update it ;-)

FWIW, this appears to be a test-runner issue, possibly with the interaction with the subtests? It seems to always happen with the local-storage-nested subtest but that test does not exercise the processTicksAndRejections path where the access violation appears to be happening. misread that... actually, I definitely think it's an issue with the test runner and the way this test is structured. Going to try an experiment that splits these out into individual tests

cjihrig commented 1 week ago

FWIW, this appears to be a test-runner issue

Are you referring to the Python runner or the JS one? I'm assuming you mean Python in this case, but just wanted to verify.

jasnell commented 1 week ago

Not sure yet. I think maybe the python runner but can't be certain. I'm running an experiment in #54818 that removes both the node:test runner and the python test runner from the mix to see if we still see flakiness on these. If that looks good, I want to next try with the python runner but no node-test runner, following the pattern of the original test in spawning a bunch of these processes in parallel to see if we still have flakiness. Unfortunately I was not able to reproduce the failure locally at all on a quick test on my old windows machine so I'm going a bit by trial-an-error using CI.

cjihrig commented 1 week ago
16:17:18     test at test\parallel\test-async-context-frame.mjs:48:5
16:17:18     รขล“โ€“ async-hooks\test-async-local-storage-socket.js (5107.2542ms)
16:17:18       AssertionError [ERR_ASSERTION]: Test async-hooks\test-async-local-storage-socket.js failed with exit code 3221225477
16:17:18           at TestContext.<anonymous> (file:///C:/workspace/node-test-binary-windows-js-suites/node/test/parallel/test-async-context-frame.mjs:58:7)
16:17:18           at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
16:17:18           at async Test.run (node:internal/test_runner/test:888:9)
16:17:18           at async Promise.all (index 14)
16:17:18           at async Suite.run (node:internal/test_runner/test:1268:7)
16:17:18           at async startSubtestAfterBootstrap (node:internal/test_runner/harness:280:3) {
16:17:18         generatedMessage: false,
16:17:18         code: 'ERR_ASSERTION',
16:17:18         actual: 3221225477,
16:17:18         expected: 0,
16:17:18         operator: 'strictEqual'
16:17:18       }

The line throwing that is:

strictEqual(code, 0, `Test ${test} failed with exit code ${code}`);

The code is an exit code of a child process running the Python runner. I'd be surprised if that was related to node:test.

jasnell commented 1 week ago

I would be too! But just going through due diligence! It's also possible this isn't a problem with the tests at all and something wrong with the windows host running the tests! Just need to go through the steps to eliminate possible causes and narrow it down. node:test is my least likely candidate tho. Yeah, sorry I wasn't clear, when I said test runner I meant the python test runner along with possibly the spawning of multiple instances running in parallel.

Qard commented 1 week ago

I've been exploring all the results of the reported failures and they seem to fail and many different points in the run. It has a slight bias toward some particular tests but that could be a timing thing and not actually indicative of any failure of the code itself. I think I agree with @jasnell here that something seems off with running multiple python test runners in parallel. Might be worth just disabling (or reducing?) the parallelism and see how that goes?