nrwl / nx

Smart Monorepos · Fast CI
https://nx.dev
MIT License
22.65k stars 2.26k forks source link

`npm install` in a project with 2 `nx` packages randomly fails with `npm ERR! code 135` #26517

Open Den-dp opened 4 weeks ago

Den-dp commented 4 weeks ago

Current Behavior

npm install on my CI job sometimes fails to complete because of a random postinstall failure.

I think it might be related to multiple different versions of nx package brought by nx-dotnet:

> npm ls nx

+-- @nx-dotnet/core@2.2.0
| +-- @nx-dotnet/utils@2.2.0
| | `-- @nx/devkit@17.0.2
| |   `-- nx@18.3.5
| |     `-- @nrwl/tao@18.3.5
| |       `-- nx@18.3.5 deduped
| `-- @nx/devkit@17.0.2
|   `-- nx@18.3.5
|     `-- @nrwl/tao@18.3.5
|       `-- nx@18.3.5 deduped
+-- @nx/devkit@19.2.3
| `-- nx@19.2.3 deduped
+-- @nx/js@19.2.3
| `-- @nx/workspace@19.2.3
|   `-- nx@19.2.3 deduped
`-- nx@19.2.3
  `-- @nrwl/tao@19.2.3
    `-- nx@19.2.3 deduped

Also, I found that it never fails when I use

npm install --foreground-scripts

Expected Behavior

If it is true that two different nx versions can conflict when installing, then it would be helpful to handle it if possible

GitHub Repo

No response

Steps to Reproduce

  1. generate workspace with nx@19.2.3 and @nx-dotnet/core@2.2.0
  2. run npm install In my case, I use an Ubuntu-based Jenkins job without dotnet (which might be important for nx-dotnet but shouldn't be a problem for overall npm install in nx workspace)
    pipeline {
    agent {
    docker { image 'mcr.microsoft.com/playwright:jammy' }
    }
    stages {
    stage('Install dependencies') {
      steps {
        sh 'npm install --verbose'
      }
    }
    }
    }

Nx Report

Node   : 20.13.1
 OS     : linux-x64
 npm    : 10.5.2

 nx                 : 19.2.3
 @nx/js             : 19.2.3
 @nx/jest           : 19.2.3
 @nx/linter         : 19.2.3
 @nx/eslint         : 19.2.3
 @nx/workspace      : 19.2.3
 @nx/devkit         : 19.2.3
 @nx/eslint-plugin  : 19.2.3
 @nx/playwright     : 19.2.3
 @nrwl/tao          : 19.2.3
 typescript         : 5.4.5
 ---------------------------------------
 Registered Plugins:
 @nx-dotnet/core
 @nx/eslint/plugin
 @nx/jest/plugin
 ---------------------------------------
 Community plugins:
 @nx-dotnet/core : 2.2.0

Failure Logs

npm info run @nx-dotnet/core@2.2.0 postinstall node_modules/@nx-dotnet/core node ./src/tasks/post-install
 npm info run @swc/core@1.5.7 postinstall node_modules/@swc/core node postinstall.js
 npm info run nx@19.2.3 postinstall node_modules/nx node ./bin/post-install
 npm info run @nx-dotnet/core@2.2.0 postinstall { code: 0, signal: null }
 npm info run nx@18.3.5 postinstall node_modules/@nx-dotnet/core/node_modules/nx node ./bin/post-install
 npm info run @swc/core@1.5.7 postinstall { code: 0, signal: null }
 npm info run nx@18.3.5 postinstall node_modules/@nx-dotnet/utils/node_modules/nx node ./bin/post-install
 npm info run nx@18.3.5 postinstall { code: 0, signal: null }
 npm info run nx@18.3.5 postinstall { code: 135, signal: null }
 npm info run nx@19.2.3 postinstall { code: 0, signal: null }
 npm verb stack Error: command failed
 npm verb stack     at ChildProcess.<anonymous> (/usr/lib/node_modules/npm/node_modules/@npmcli/promise-spawn/lib/index.js:53:27)
 npm verb stack     at ChildProcess.emit (node:events:519:28)
 npm verb stack     at maybeClose (node:internal/child_process:1105:16)
 npm verb stack     at Socket.<anonymous> (node:internal/child_process:457:11)
 npm verb stack     at Socket.emit (node:events:519:28)
 npm verb stack     at Pipe.<anonymous> (node:net:338:12)
 npm verb pkgid nx@18.3.5
 npm verb cwd /home/jenkins/agent/workspace/acme-main
 npm verb Linux 5.15.146+
 npm verb node v20.13.1
 npm verb npm  v10.5.2
 npm ERR! code 135
 npm ERR! path /home/jenkins/agent/workspace/acme-main/node_modules/@nx-dotnet/core/node_modules/nx
 npm ERR! command failed
 npm ERR! command sh -c node ./bin/post-install
 npm ERR! Bus error (core dumped)
 npm verb exit 135
 npm verb unfinished npm timer reify 1718135852618
 npm verb unfinished npm timer reify:build 1718135868493
 npm verb unfinished npm timer build 1718135868494
 npm verb unfinished npm timer build:deps 1718135868495
 npm verb unfinished npm timer build:run:postinstall 1718135868529
 npm verb unfinished npm timer build:run:postinstall:node_modules/@nx-dotnet/core/node_modules/nx 1718135868568
 npm verb code 135

Package Manager Version

No response

Operating System

Additional Information

/cc @AgentEnder

whygee-dev commented 3 weeks ago

Having the same error randomly in our pipeline

Daniel-Griffiths commented 3 weeks ago

Having a similar issue here but it usually gets a 129 status code. yarn install works ok locally but sometimes fails on CI.

nbalu2 commented 1 week ago

129 should be separated from this story. We've also run into the same issue with GHA runners.

It's not predictable when the error happens, though from 10 to 25% our builds are failing in postinstall -> node ./bin/postinstall step.

The problem is that SIGBUS indicates that the error is actually native memory access issue.

------------------------ LOCAL NX report ------------------------------
NX   Report complete - copy this into the issue template

Node   : 18.18.0
OS     : win32-x64
pnpm   : 9.4.0

nx                 : 19.2.3
@nx/js             : 19.2.3
@nx/jest           : 19.2.3
@nx/linter         : 19.2.3
@nx/eslint         : 19.2.3
@nx/workspace      : 19.2.3
@nx/angular        : 19.2.3
@nx/eslint-plugin  : 19.2.3
@nx/storybook      : 19.2.3
@nx/web            : 19.2.3
typescript         : 5.4.5
---------------------------------------
Registered Plugins:
some-workspace-plugin
---------------------------------------
Community plugins:
@ngneat/spectator        : 18.0.2
@storybook/angular       : 8.1.6
angular-auth-oidc-client : 17.1.0
nx-stylelint             : 17.1.5
---------------------------------------
Local workspace plugins:
         some-workspace-plugin

We are using GHA hosted agents with ubuntu-latest so OS is different on CI.

Current runner version: '2.317.0'
Operating System
  Ubuntu
  LTS
Runner Image
  Image: ubuntu-22.04
  Version: 20240616.1.0
  Included Software: https://github.com/actions/runner-images/blob/ubuntu22/20240616.1/images/ubuntu/Ubuntu2204-Readme.md
  Image Release: https://github.com/actions/runner-images/releases/tag/ubuntu22%2F20240616.1

Error: image

We are using PNPM for package managment. So probably symlinks are present (this might be relevant later).

After seeing the issue I've captured the core dumps. image

So unfortunately backtrace can't really help without debug symbols (at least for me). Though at least we know that there's 2 modules that core dump was trying to map -> node + nx-native-file-cache.

So trying to turn off cache with the awesome variables on the workflow with - NX_SKIP_NX_CACHE:true & NX_CACHE_PROJECT_GRAPH: false didn't helped + the core dump was almost identical. At least it's backtrace...

Is there any way we can skip nx-native-file-cache?

Daniel-Griffiths commented 1 week ago

A temporary fix for the meantime was to disable the nx postinstall script by using yarn to patch it.

https://yarnpkg.com/cli/patch

image

Den-dp commented 1 week ago

As I mentioned in the issue, I was able to workaround it by opting into sequential postinstall script execution via:

npm install --foreground-scripts
nbalu2 commented 1 week ago

Both great, though have to look into a PNPM version of it. 😄