nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
105.23k stars 28.5k forks source link

test-http2-large-file is flaky #47409

Open MoLow opened 1 year ago

MoLow commented 1 year ago

Test

test-http2-large-file

Platform

AIX

Console output

22:30:07 not ok 3904 sequential/test-http2-large-file
22:30:07   ---
22:30:07   duration_ms: 120107.72300
22:30:07   severity: fail
22:30:07   exitcode: -15
22:30:07   stack: |-
22:30:07     timeout
22:30:07   ...

Build links

Additional information

No response

MoLow commented 1 year ago

I took https://ci.nodejs.org/computer/test-osuosl-aix72-ppc64_be-2/ offline to investigate

MoLow commented 1 year ago

on AIX tools/test.py sequential/test-http2-large-file --timeout 30 --repeat 50 reproduces easily however, this change causes the test to be very stable and I have no idea why:

diff --git a/test/sequential/test-http2-large-file.js b/test/sequential/test-http2-large-file.js
index 2f2cc2c80d..3afcec73f4 100644
--- a/test/sequential/test-http2-large-file.js
+++ b/test/sequential/test-http2-large-file.js
@@ -24,6 +24,7 @@ server.listen(0, common.mustCall(() => {
                                { settings: { initialWindowSize: 6553500 } });
   const request = client.request({ ':method': 'POST' });
   function writeChunk() {
+    console.log({ remaining });
     if (remaining > 0) {
       remaining -= chunkLength;
       request.write(chunk, writeChunk);
Trott commented 1 year ago

on AIX tools/test.py sequential/test-http2-large-file --timeout 30 --repeat 50 reproduces easily however, this change causes the test to be very stable and I have no idea why:

diff --git a/test/sequential/test-http2-large-file.js b/test/sequential/test-http2-large-file.js
index 2f2cc2c80d..3afcec73f4 100644
--- a/test/sequential/test-http2-large-file.js
+++ b/test/sequential/test-http2-large-file.js
@@ -24,6 +24,7 @@ server.listen(0, common.mustCall(() => {
                                { settings: { initialWindowSize: 6553500 } });
   const request = client.request({ ':method': 'POST' });
   function writeChunk() {
+    console.log({ remaining });
     if (remaining > 0) {
       remaining -= chunkLength;
       request.write(chunk, writeChunk);

There's a long history of console.log() statements causing tests to succeed that otherwise fail because they are blocking or otherwise causes an I/O delay that "fixes" a race condition. I can probably find an issue from 2015 about this!

MoLow commented 1 year ago

it does not seem to be a race condition, this did not fix the issue:

diff --git a/test/sequential/test-http2-large-file.js b/test/sequential/test-http2-large-file.js
index 2f2cc2c80d..300e0642ab 100644
--- a/test/sequential/test-http2-large-file.js
+++ b/test/sequential/test-http2-large-file.js
@@ -24,12 +24,14 @@ server.listen(0, common.mustCall(() => {
                                { settings: { initialWindowSize: 6553500 } });
   const request = client.request({ ':method': 'POST' });
   function writeChunk() {
-    if (remaining > 0) {
+   setTimeout(() => {
+   if (remaining > 0) {
       remaining -= chunkLength;
       request.write(chunk, writeChunk);
     } else {
       request.end();
     }
+   }, 10);
Trott commented 1 year ago

I forget (or maybe never knew!) the details, but console.log() has weird side effects that cause I/O flushing or other things and the side effects vary from platform to platform. Maybe someone on @nodejs/console or @nodejs/libuv would know more. If I recall correctly, it blocks on some platforms but not on others.

MoLow commented 1 year ago

process.stdout.write(''); does not make the issue go await either so it is definitely very specific to console.log

richardlau commented 1 year ago

@miladfarca has been investigating this as an upstream V8 issue: https://bugs.chromium.org/p/v8/issues/detail?id=13902

MoLow commented 1 year ago

https://ci.nodejs.org/computer/test-osuosl-aix72-ppc64_be-2/ is back online

targos commented 5 months ago

Happened on Windows and Linux with https://ci.nodejs.org/job/node-test-pull-request/56726/

targos commented 5 months ago

The V8 issue talks about disabling concurrent sparkplug to fix it temporarily but we already do it: https://github.com/nodejs/node/commit/7071c1dafd70dfaa8f219da8f53153b63047d1cc

Related:

targos commented 5 months ago

This really happens a lot lately. I opened https://github.com/nodejs/node/pull/51549

From https://github.com/nodejs/reliability/issues/772:

Reason sequential/test-http2-large-file
Type JS_TEST_FAILURE
Failed PR 17 (https://github.com/nodejs/node/pull/45427/, https://github.com/nodejs/node/pull/51401/, https://github.com/nodejs/node/pull/51410/, https://github.com/nodejs/node/pull/50960/, https://github.com/nodejs/node/pull/51422/, https://github.com/nodejs/node/pull/51431/, https://github.com/nodejs/node/pull/51412/, https://github.com/nodejs/node/pull/51352/, https://github.com/nodejs/node/pull/51172/, https://github.com/nodejs/node/pull/51446/, https://github.com/nodejs/node/pull/51459/, https://github.com/nodejs/node/pull/51044/, https://github.com/nodejs/node/pull/51437/, https://github.com/nodejs/node/pull/51461/, https://github.com/nodejs/node/pull/51456/, https://github.com/nodejs/node/pull/51455/, https://github.com/nodejs/node/pull/51472/)
Appeared test-rackspace-win2022_vs2022-x64-1, test-azure_msft-win11_vs2022-x64-3, test-rackspace-win2019_vs2019-x64-1, test-rackspace-win2019_vs2019-x64-2, test-azure_msft-win10_vs2019-x64-1, test-rackspace-win2022_vs2022-x64-4, test-azure_msft-win11_vs2022-x64-1, test-rackspace-win2019_vs2019-x64-3, test-rackspace-win2022_vs2022-x64-6, test-azure_msft-win11_vs2022-x64-2, test-digitalocean-fedora38-x64-1, test-digitalocean-ubuntu2204-x64-2, test-rackspace-win2019_vs2019-x64-4
First CI https://ci.nodejs.org/job/node-test-pull-request/56712/
Last CI https://ci.nodejs.org/job/node-test-pull-request/56791/
Example ``` not ok 950 sequential/test-http2-large-file --- duration_ms: 120192.99600 severity: fail exitcode: 1 stack: |- timeout ... ```
shnooshnoo commented 5 months ago

The first time it occurred in this build https://ci.nodejs.org/job/node-daily-master/3286/, so i guess something that landed around 7th of Jan might've caused this. From what I can see the difference is this commit https://github.com/nodejs/node/commit/1a5acd0638579e687dde128cc6d4effe3ab070d1

mhdawson commented 3 months ago

Again on linux - https://ci.nodejs.org/job/node-test-commit-linux/nodes=fedora-latest-x64/56789/

mhdawson commented 2 months ago

again - https://ci.nodejs.org/job/node-test-commit-linux/nodes=fedora-latest-x64/57416/