nextstrain / nextstrain.org

The Nextstrain website
https://nextstrain.org
GNU Affero General Public License v3.0
87 stars 49 forks source link

`cryptography.test.js` fails on recent node versions #814

Open jameshadfield opened 2 months ago

jameshadfield commented 2 months ago

During development of #810, which didn't modify any code specifically relating to cryptography.test.js, test failures were consistenly observed when run on nodejs 20.12.0 and 20.11.1 in GitHub Actions. The tests passed on nodejs 20.9.0 both locally and via GitHub actions. PR #810 pinned the version to 20.9.0.

I haven't explored why, as 20.9.0 is the latest available on conda, but the node.js changelog has a few crypto-related changes between those versions.

In case those action logs are gone by the time someone reads this, here's the error log: ```console FAIL test/cryptography.test.js (15.536 s) ● encrypt/decrypt roundtrip TypeError: Cannot read properties of null (reading 'length') at getStateLength (node_modules/stream-shift/index.js:16:28) at shift (node_modules/stream-shift/index.js:6:99) at Duplexify.Object..Duplexify._forward (node_modules/@aws-crypto/encrypt-node/node_modules/duplexify/index.js:170:35) at SignatureStream.onreadable (node_modules/@aws-crypto/encrypt-node/node_modules/duplexify/index.js:136:10) Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test." 87 | 88 | > 89 | test("session id must match", async () => { | ^ 90 | const {getTokens, setTokens} = await import("../src/authn/session"); 91 | 92 | const session = { at test/authn.test.js:89:1 ● benchmark TypeError: Cannot read properties of null (reading 'length') at getStateLength (node_modules/stream-shift/index.js:16:28) at shift (node_modules/stream-shift/index.js:6:99) at Duplexify.Object..Duplexify._forward (node_modules/@aws-crypto/encrypt-node/node_modules/duplexify/index.js:170:35) at SignatureStream.onreadable (node_modules/@aws-crypto/encrypt-node/node_modules/duplexify/index.js:136:10) ● benchmark thrown: "Exceeded timeout of 5000 ms for a test. Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test." 112 | 113 | > 114 | test("benchmark", async () => { | ^ 115 | const {getTokens, setTokens} = await import("../src/authn/session"); 116 | 117 | const session = { at test/authn.test.js:114:1 Test Suites: 2 failed, 10 passed, 12 total Tests: 6 failed, 28 skipped, 547 passed, 581 total Snapshots: 0 total Time: 20.525 s Ran all test suites. Error: Command failed with exit code 1: npm run test at makeError (/home/runner/work/nextstrain.org/nextstrain.org/node_modules/start-server-and-test/node_modules/execa/lib/error.js:56:11) at handlePromise (/home/runner/work/nextstrain.org/nextstrain.org/node_modules/start-server-and-test/node_modules/execa/index.js:114:26) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) { command: 'npm run test', exitCode: 1, signal: undefined, signalDescription: undefined, stdout: undefined, stderr: undefined, failed: true, timedOut: false, isCanceled: false, killed: false } ```
tsibley commented 2 months ago

The error suggests it's no longer valid to pass a null context object thru our encrypt/decrypt functions. If so, passing an empty context object for these tests would be fine (or extending our encrypt/decrypt functions to convert null → empty for us).

This could be a Node.js change or an @aws-crypto library change that we only pick up on newer Node.js versions.

I'm submitting a conda-forge update for Node.js 20.12.2 so I can test it out.

tsibley commented 2 months ago

Well the conda-forge update had a few issues, so I switched to Docker instead to reproduce:

nextstrain shell --image node:20.12 .
npm ci
NODE_OPTIONS='--experimental-vm-modules' npx jest --run-tests-by-path test/cryptography.test.js

This let me see it's not related to the context object. Also, I realized I'd focused on the first error for the first failed test:

TypeError: Cannot read properties of null (reading 'length')

but the second error (?) for the same test is more intriguing:

thrown: "Exceeded timeout of 5000 ms for a test.    
Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

Something's hanging in the test, and the TypeError might just be a red herring resulting from that. The TypeError stack is incomplete, or at least, doesn't track across async-boundaries which makes it not very useful as-is. More digging required.

tsibley commented 2 months ago

Ahh, this is an internal issue in the @aws-crypto libraries. https://github.com/aws/aws-encryption-sdk-javascript/issues/1374