open-telemetry / opentelemetry-js-contrib

OpenTelemetry instrumentation for JavaScript modules
https://opentelemetry.io
Apache License 2.0
703 stars 523 forks source link

[ci] test-all-versions failing for `@cucumber/cucumber@9.1.2` and `@aws-sdk/client-s3@3.377.0` #1828

Closed pichlermarc closed 11 months ago

pichlermarc commented 11 months ago

What happened?

The TAV script currently fails in CI only (tested locally but could not reproduce) for @opentelemetry/instrumentation-cucumber with @cucumber/cucumber@9.1.2.

It looks like it cannot resolve the package @cucumber/messages for some reason - when I run npm run test-all-versions locally, everything works fine.

Logs:

cc @Ugzuzg (component-owner)

Ugzuzg commented 11 months ago

Same. Works locally.

pichlermarc commented 11 months ago

Looks like this affects more than just the cucumber instrumentation, having a similar problem over at an instrumentation-aws-sdk PR. I'll re-label the issue accordingly.

Sorry for the noise

trentm commented 11 months ago

I can reproduce the @cucumber/cucumber@9.1.2 issue locally. I'll add some details in a bit. I had a frustrating afternoon yesterday trying to understand the failing vs. not failing case.

trentm commented 11 months ago

Repro for the cucumber TAV (test-all-versions) failure.

tl;dr:

npm ci
npx lerna run --scope @opentelemetry/instrumentation-cucumber test-all-versions

With details:

$ cd .../opentelemetry-js-contrib  # top dir of your clone

$ npm ci   # start with a clean layout of node_modules/... dirs

# Observe the current install location of all the '@cucumber/...' packages. These aren't all 
# the deps of '@cucumber/cucumber', but enough to see the issue.
$ ls **/node_modules/@cucumber/*/package.json
node_modules/@cucumber/ci-environment/package.json
node_modules/@cucumber/cucumber-expressions/package.json
node_modules/@cucumber/cucumber/package.json
node_modules/@cucumber/gherkin-streams/package.json
node_modules/@cucumber/gherkin-utils/node_modules/@cucumber/gherkin/package.json
node_modules/@cucumber/gherkin-utils/node_modules/@cucumber/messages/package.json
node_modules/@cucumber/gherkin-utils/package.json
node_modules/@cucumber/gherkin/package.json
node_modules/@cucumber/html-formatter/package.json
node_modules/@cucumber/message-streams/package.json
node_modules/@cucumber/messages/package.json
node_modules/@cucumber/tag-expressions/package.json

# Run the TAV tests for cucumber rougly the way the 'test-all-versions.yml' action does.
$ npx lerna run --scope @opentelemetry/instrumentation-cucumber test-all-versions
...
-- required packages ["@cucumber/cucumber@9.1.2"]
-- installing ["@cucumber/cucumber@9.1.2"]
-- running test "npm test" with @cucumber/cucumber (env: {})
> @opentelemetry/instrumentation-cucumber@0.1.2 test
> nyc ts-mocha -p tsconfig.json 'test/**/*.test.ts'
Error: Cannot find module '@cucumber/messages'
Require stack:
- /Users/trentm/tm/opentelemetry-js-contrib4/node_modules/@cucumber/gherkin-streams/dist/src/makeGherkinOptions.js
...

# Observer the install location changes now.
$ ls **/node_modules/@cucumber/*/package.json
node_modules/@cucumber/gherkin-streams/package.json
node_modules/@cucumber/gherkin-utils/node_modules/@cucumber/gherkin/package.json
node_modules/@cucumber/gherkin-utils/node_modules/@cucumber/messages/package.json
node_modules/@cucumber/gherkin-utils/package.json
node_modules/@cucumber/message-streams/package.json
node_modules/@cucumber/tag-expressions/package.json
plugins/node/instrumentation-cucumber/node_modules/@cucumber/ci-environment/package.json
plugins/node/instrumentation-cucumber/node_modules/@cucumber/cucumber-expressions/package.json
plugins/node/instrumentation-cucumber/node_modules/@cucumber/cucumber/package.json
plugins/node/instrumentation-cucumber/node_modules/@cucumber/gherkin/package.json
plugins/node/instrumentation-cucumber/node_modules/@cucumber/html-formatter/package.json
plugins/node/instrumentation-cucumber/node_modules/@cucumber/messages/package.json

# Those install locations mean import those deps are broken.
$ (cd plugins/node/instrumentation-cucumber; node -e "require('@cucumber/cucumber')")
node:internal/modules/cjs/loader:1031
  throw err;
  ^
Error: Cannot find module '@cucumber/messages'
...

Note, in case it matters: I'm doing my tests with node v16.20.2 (npm v8.19.4) -- for the unrelated reason that npm ci is MUCH faster for me with Node.js 16 than with Node.js 18 and I haven't had a chance to look into why.

when I run npm run test-all-versions locally, everything works fine.

Yup, it works this way.

$ npm ci
$ cd plugins/node/instrumentation-cucumber
$ npm run test-all-versions
...

$ cd ../../../
$ ls **/node_modules/@cucumber/*/package.json
node_modules/@cucumber/gherkin-streams/package.json
node_modules/@cucumber/gherkin-utils/node_modules/@cucumber/gherkin/package.json
node_modules/@cucumber/gherkin-utils/node_modules/@cucumber/messages/package.json
node_modules/@cucumber/gherkin-utils/package.json
node_modules/@cucumber/gherkin/package.json
node_modules/@cucumber/message-streams/package.json
node_modules/@cucumber/messages/package.json
node_modules/@cucumber/tag-expressions/package.json
plugins/node/instrumentation-cucumber/node_modules/@cucumber/ci-environment/package.json
plugins/node/instrumentation-cucumber/node_modules/@cucumber/cucumber-expressions/package.json
plugins/node/instrumentation-cucumber/node_modules/@cucumber/cucumber/package.json
plugins/node/instrumentation-cucumber/node_modules/@cucumber/gherkin/package.json
plugins/node/instrumentation-cucumber/node_modules/@cucumber/html-formatter/package.json
plugins/node/instrumentation-cucumber/node_modules/@cucumber/messages/package.json

In this case, the layout is slightly different. The main difference is that @cucumber/messages is in the top-level node_modules dir.

a slightly quicker repro

The failure reproduces with just that single cucumber version tested, so this change means the repro is faster, in case someone else is trying to debug this as well.

diff --git a/plugins/node/instrumentation-cucumber/.tav.yml b/plugins/node/instrumentation-cucumber/.tav.yml
index c1930c8c..bf4f19f3 100644
--- a/plugins/node/instrumentation-cucumber/.tav.yml
+++ b/plugins/node/instrumentation-cucumber/.tav.yml
@@ -1,3 +1,4 @@
 '@cucumber/cucumber':
-  versions: '^8.0.0 || ^9.0.0'
+  # versions: '^8.0.0 || ^9.0.0'
+  versions: '9.1.2'  # XXX failure reproduces with just this version tested
   commands: npm test
trentm commented 11 months ago

I was getting ready to blame this on npm install being broken -- for this npm workspaces case and when called in the way that lerna/nx are calling it. However, the import failure is because @cucumber/messages can't be found by the @cucumber/gherkin-streams package ... and @cucumber/gherkin-streams doesn't actually have a dependency on @cucumber/messages. It has a peerDep and a devDep on it.

Only these 3 packages have a "dependencies" entry for @cucumber/messages:

% cat  **/node_modules/@cucumber/*/package.json | json -ga -c 'this?.dependencies && this.dependencies["@cucumber/messages"]' name
@cucumber/gherkin
@cucumber/gherkin-utils
@cucumber/cucumber
@cucumber/gherkin

So, I think ultimately I'd argue that @cucumber/gherkin-streams (and I think at least one other cucumber package) have a bug that they use @cucumber/messages but don't have an "dependencies" entry for it. At least that's for my old-timey understanding of peerDependencies entries being completely advisory. I need to learn a bit more about peerDeps.

Note that our CI sets legacy-peer-deps=true (https://docs.npmjs.com/cli/v8/using-npm/config#legacy-peer-deps):

+      - name: Legacy Peer Dependencies for npm 7
+        if: matrix.container == 'node:16'
+        run: npm config set legacy-peer-deps=true

which was added back in https://github.com/open-telemetry/opentelemetry-js-contrib/pull/614 when using hoisting with lerna to speed up npm installs. This could be related, but I don't think so: because IIRC we get this failure in CI with node v14 (which has npm@6 which ignores peerDependencies) as well. Also, locally I've been testing with legacy-peer-deps unset (it defaults to false).

(Aside: I wonder if that means that legacy-peer-deps might not be unnecessary for our node v16 installs.)

trentm commented 11 months ago

GAH! It is totally because of legacy-peer-deps=true.

I ran with this diff:

diff --git a/plugins/node/instrumentation-cucumber/package.json b/plugins/node/instrumentation-cucumber/package.json
index c44c00f9..22312787 100644
--- a/plugins/node/instrumentation-cucumber/package.json
+++ b/plugins/node/instrumentation-cucumber/package.json
@@ -7,7 +7,7 @@
   "repository": "open-telemetry/opentelemetry-js-contrib",
   "scripts": {
     "test": "nyc ts-mocha -p tsconfig.json 'test/**/*.test.ts'",
-    "test-all-versions": "tav",
+    "test-all-versions": "npm config ls && tav #XXX",
     "tdd": "npm run test -- --watch-extensions ts --watch",
     "clean": "rimraf build/*",
     "lint": "eslint . --ext .ts",

A full failing repro run: https://gist.github.com/trentm/efb0a8586351878a74f24256ced15af4 A full passing run (running npm run test-all-versions in the subpackage dir): https://gist.github.com/trentm/f02b71fe09b91bcb6c8d1e693ad22fb1

Note this in the failing run:

> npm config ls && tav #XXX
...
; "env" config from environment
...
legacy-peer-deps = true
...

The passing run does not have legacy-peer-deps set, and signs are pointing to that being the relevant diff. But what is setting legacy-peer-deps? I had earlier grepped for legacy-peer-deps and legacyPeerDeps and didn't find anything setting it, but...

% rg -g '*' legacy_peer_deps
node_modules/nx/src/utils/package-manager.js
76:            (_a = (_b = process.env).npm_config_legacy_peer_deps) !== null && _a !== void 0 ? _a : (_b.npm_config_legacy_peer_deps = 'true');

Gah. I haven't traced this yet, but I suspect nx's code path for running commands in sub-packages is setting that config var and that breaks our install.

trentm commented 11 months ago

This passes:

npm_config_legacy_peer_deps=false npx lerna run --scope @opentelemetry/instrumentation-cucumber test-all-versions

TODO:

trentm commented 11 months ago

Some nx history:

commit 72ce35264f1dd5f5029e7ac463a83efb252622a7
Date:   2021-05-04T16:02:12-04:00 (2 years, 7 months ago)

    fix(core): change legacy_peer_deps to npm env variable (#5541)

...

commit 2604711f5fc08c7a80d7900094e477510f120ee7
Date:   2021-01-27T21:32:55-05:00 (2 years, 10 months ago)

    fix(core): add --legacy-peer-deps to npm installs to handle npm 7 related issues

The PR and issue do not provide much detail.

I am inclined to avoid nx (and hence lerna) for running scripts in subpackages, just to avoid the bulk of code involved in doing this -- but that is longer term. I'll work on a fix/workaround, probably using npm run --ws .... npm run --ws ... does not support running in parallel, but that's fine for the TAV tests which CI serializes anyway:

      - name: Run test-all-versions
        run: npx lerna run test-all-versions ${{ inputs.lerna-args }} ${{ matrix.lerna-extra-args }} --stream --concurrency 1
trentm commented 11 months ago

@aws-sdk/client-s3@3.377.0 repro

To make for a faster repro, first change "plugins/node/opentelemetry-instrumentation-aws-sdk/.tav.yml" to be just this:

"@aws-sdk/client-s3":
  # versions: ">=3.223.0 || 3.218.0 || 3.216.0 || 3.154.0 || 3.107.0 || 3.54.0 || 3.6.1"
  versions: "3.377.0"
  commands:
    - npm run test

Then run:

npm ci
npx lerna run test-all-versions --scope @opentelemetry/instrumentation-aws-sdk

The result:

% npx lerna run test-all-versions --scope @opentelemetry/instrumentation-aws-sdk
lerna notice cli v6.6.2
lerna info versioning independent
lerna notice filter including "@opentelemetry/instrumentation-aws-sdk"
lerna info filter [ '@opentelemetry/instrumentation-aws-sdk' ]

> @opentelemetry/instrumentation-aws-sdk:test-all-versions

> @opentelemetry/instrumentation-aws-sdk@0.37.0 test-all-versions
> tav
-- required packages ["@aws-sdk/client-s3@3.377.0"]
-- installing ["@aws-sdk/client-s3@3.377.0"]
-- running test "npm run test" with @aws-sdk/client-s3 (env: {})
> @opentelemetry/instrumentation-aws-sdk@0.37.0 test
> nyc ts-mocha -p tsconfig.json --require '@opentelemetry/contrib-test-utils' 'test/**/*.test.ts'
Error: Cannot find module '@smithy/hash-stream-node'
Require stack:
- /Users/trentm/tm/opentelemetry-js-contrib4/node_modules/@aws-sdk/client-s3/dist-cjs/runtimeConfig.js
- /Users/trentm/tm/opentelemetry-js-contrib4/node_modules/@aws-sdk/client-s3/dist-cjs/S3Client.js
- /Users/trentm/tm/opentelemetry-js-contrib4/node_modules/@aws-sdk/client-s3/dist-cjs/index.js
- /Users/trentm/tm/opentelemetry-js-contrib4/plugins/node/opentelemetry-instrumentation-aws-sdk/test/aws-sdk-v3.test.ts
- /Users/trentm/tm/opentelemetry-js-contrib4/node_modules/mocha/lib/esm-utils.js
- /Users/trentm/tm/opentelemetry-js-contrib4/node_modules/mocha/lib/mocha.js
- /Users/trentm/tm/opentelemetry-js-contrib4/node_modules/mocha/lib/cli/one-and-dones.js
- /Users/trentm/tm/opentelemetry-js-contrib4/node_modules/mocha/lib/cli/options.js
- /Users/trentm/tm/opentelemetry-js-contrib4/node_modules/mocha/bin/mocha
    at Function.Module._resolveFilename (node:internal/modules/cjs/loader:1028:15)
    at Function.Module._load (node:internal/modules/cjs/loader:873:27)
    at Module.require (node:internal/modules/cjs/loader:1100:19)
...

the issue

This is a different problem. tl;dr: @aws-sdk/client-s3@3.377.0 was just a bad release.

If we look for usages of */hash-stream-node in an install of @aws-sdk/client-s3@3.377.0:

[~/tmp/asdf.badclients3/node_modules/@aws-sdk/client-s3]
% rg hash-stream-node
package.json
30:    "@aws-sdk/hash-stream-node": "*",

dist-es/runtimeConfig.js
9:import { readableStreamHasher as streamHasher } from "@smithy/hash-stream-node";

dist-cjs/runtimeConfig.js
13:const hash_stream_node_1 = require("@smithy/hash-stream-node");

We can see that the code changed to using the @smithy/hash-stream-node module, but the package.json "dependencies" still point to the old @aws-sdk/hash-stream-node package. Neither the release notes for 3.377.0 nor 3.378.0 say anything about it. However, 3.378.0 was released the next day:

% npm info @aws-sdk/client-s3 time
...
  '3.377.0': '2023-07-25T21:15:57.826Z',
  '3.378.0': '2023-07-26T19:35:54.575Z',
...

And installing 3.377.0 yields a manually added npm deprecation warning:

% npm install @aws-sdk/client-s3@3.377.0
npm WARN deprecated @aws-sdk/hash-stream-node@3.374.0: This package has moved to @smithy/hash-stream-node
...

the fix

I think we just need to make sure to avoid testing this version in the TAV tests. I'll update the .tav.yml. It could also use an update because currently it is testing ~144 versions of @aws-sdk/client-s3.