npm / cli

the package manager for JavaScript
https://docs.npmjs.com/cli/
Other
8.17k stars 2.97k forks source link

[BUG] npm ci gives timeout error on gitlab runner #7076

Open ghostx31 opened 6 months ago

ghostx31 commented 6 months ago

Is there an existing issue for this?

This issue exists in the latest npm version

Current Behavior

We are running using the node:18 docker image in our CI pipeline running on self hosted Gitlab runner. The runner is hosted on a VM running on GCP and our gitlab is running on GKE v.127.

We started having this error a week ago. This is not a runner issue since we are able to install packages using pip and mix deps.get for elixir and only npm fails. We are not using any proxy and internet is reachable from docker containers.

The runner is configured with the docker executor. The exact error is:

$ npm ci
npm ERR! code ETIMEDOUT
npm ERR! syscall connect
npm ERR! errno ETIMEDOUT
npm ERR! network request to https://registry.npmjs.org/yargs-parser/-/yargs-parser-20.2.9.tgz failed, reason: connect ETIMEDOUT 104.16.24.34:443
npm ERR! network This is a problem related to network connectivity.
npm ERR! network In most cases you are behind a proxy or have bad network settings.
npm ERR! network 
npm ERR! network If you are behind a proxy, please make sure that the
npm ERR! network 'proxy' config is set properly.  See: 'npm help config'
npm ERR! A complete log of this run can be found in: /root/.npm/_logs/2023-12-12T04_38_02_576Z-debug-0.log
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1

We were able to solve this issue by setting network_mode="host" in the gitlab runner's config but this breaks the container services communication which we need for the builds.

Expected Behavior

Running npm ci should fetch the packages from npm registry correctly.

Steps To Reproduce

Environment

zheng1716148634 commented 6 months ago

yes,me too;I also experience timeout errors;I tried many times, but it didn't work;

Larsjep commented 6 months ago

Hi, This might be related to #7072 . We saw a similar problem on our internal proxy cache due to high number of connections opened by npm.

lxfu1 commented 6 months ago

The same problem.

ghostx31 commented 6 months ago

Hi @Larsjep, thanks for pointing out your issue. Just wanted to confirm how did you check how many connections npm install was opening?

Larsjep commented 6 months ago

Hi @Larsjep, thanks for pointing out your issue. Just wanted to confirm how did you check how many connections npm install was opening?

I'm using wireshark and noticing when and how many SYN packages it sends.

ghostx31 commented 6 months ago

I see. Using node:20 image seems to fix this issue for us but I only tested this on a single project. I do not know if this breaks any dependencies for us right now so will need to test more on other projects as well. I'd be interested in knowing what the root issue in here, from npm maintainers if they get around to this.

heath-freenome commented 6 months ago

We were seeing the same issue on our gitlab runners and we were forced to pinned node to 18.18.x to avoid it. This seems to be an issue with node 10.x. Locally when I'm doing an npm ci I'm seeing some packages taking minutes to be installed (are there timeouts that keep retrying when on the command line?). There seems to be a big issue with the latest version of npm in terms of performance. NOTE: I tried my local npm ci with 10.2.5 and it takes 5+ minutes to install files. We have a local GAR based registry for some of our libraries. Could this be part of the problem?

Now that I'm trying to upgrade our systems to Node 20 I'm afraid we'll start encountering these issues again, soon. Any word from the developers on when this can be fixed?

ghostx31 commented 6 months ago

We had this problem on using the node:18 docker image and moving to node:20 on some CIs seems to have fixed this we're also concerned with dependency issues in case we decide to move to node:20 completely.

rosen-dimitrov commented 6 months ago

Hi guys,

posting here as well for visibility. Looks like an issue with the new @npmcli/agent package and the way it handles network connections.

https://github.com/npm/cli/issues/7072#issuecomment-1864444750

michalpidanic commented 4 months ago

Hello guys, has somebody solved this issue? Im having the same problem, using node:20, my npm version is 10.2.4 and im getting the same error. I tried to do npm install -g npm@9 in _beforescript however, it fails with same error 🙃 I really dont know how to solve this problem and it is already been few weeks. My runner and job has latest node images which seem to be shipped out with npm 10, so little bit of deadlock here 🙃

$ npm --version
10.2.4
$ npm uninstall -g npm@10
up to date in 320ms
$ npm install -g npm@9
npm ERR! code ETIMEDOUT
npm ERR! errno ETIMEDOUT
npm ERR! network request to https://registry.npmjs.org/npm failed, reason: 
npm ERR! network This is a problem related to network connectivity.
npm ERR! network In most cases you are behind a proxy or have bad network settings.
npm ERR! network 
npm ERR! network If you are behind a proxy, please make sure that the
npm ERR! network 'proxy' config is set properly.  See: 'npm help config'
npm ERR! A complete log of this run can be found in: /root/.npm/_logs/[20](https://gitlab.com/...[24](https://gitlab.com/...-debug-0.log
dahei commented 4 months ago

We face the same issue in our Gitlab pipelines and were forced to pin our node version to 18.18.1 (which includes npm 9.8.1). The difference also becomes obvious when running npm install locally. I tried 3 different versions on the same project.

rm -rf node_modules
nvm use ...
npm install --prefer-online

Result: 18.18.2 (npm 9.8.1) -> 38s 18.19.1 (npm 10.2.4) -> 48s 20.11.1 (npm 10.2.4) -> 2m13s (!)

In both installs with npm 10.x I could see long running requests for some packages as stated in the other comments. It's not always the same package, so it feels like a npm 10.x issue.

CleanShot 2024-02-26 at 13 59 11@2x

CleanShot 2024-02-26 at 14 00 51@2x

Within the Gitlab CI pipeline it becomes a serious blocker as the jobs take much longer and often fail.

melroy89 commented 3 months ago

You should try to use: npm ci --cache .npm --prefer-offline. And cache the .npm directory if possible in gitlab. But I digress.

Despite all this, when using this npm ci command above with cache and prefer offline, my gitlab ci/cd is constantly failing at random on one or more jobs. It's very frustrating!

And I don't understand why npm doesn't give this issue more attention. As you notice a lot of developers are effected by this.

heath-freenome commented 3 months ago

I'm still seeing issue with this on my gitlab pipelines as well, inside of Dockerfiles within the CI/CD pipelines. I am using 10.5.0:

#15 470.1 npm ERR! code ETIMEDOUT
#15 470.1 npm ERR! syscall connect
#15 470.1 npm ERR! errno ETIMEDOUT
#15 470.1 npm ERR! network request to https://registry.npmjs.org/xtend/-/xtend-2.1.2.tgz failed, reason: connect ETIMEDOUT 104.16.2.35:443
#15 470.1 npm ERR! network This is a problem related to network connectivity.
#15 470.1 npm ERR! network In most cases you are behind a proxy or have bad network settings.
#15 470.1 npm ERR! network 
#15 470.1 npm ERR! network If you are behind a proxy, please make sure that the
#15 470.1 npm ERR! network 'proxy' config is set properly.  See: 'npm help config'
#15 470.1 
#15 470.1 npm ERR! A complete log of this run can be found in: /root/.npm/_logs/2024-04-05T09_00_13_641Z-debug-0.log
#15 ERROR: process "/bin/sh -c npm run artifactregistry-login\nnpm ci\n" did not complete successfully: exit code: 1
melroy89 commented 3 months ago

Because it's not yet released. See: https://github.com/nodejs/node/pull/52351

IceBjerg commented 2 months ago

Hey everyone!

Just wanted to post here, maybe some of you might have an idea. I am using node v16.20.2, and npm v8.19.4.

I somehow get this error, when executing npm install with the help of a gitlab runner.

I don't have any idea what's going on... The error message says basically nothing, I don't see status codes other than 200 in the verbose logs. Is this truly a network issue? I have increased the fetch retries to 5, however, I don't see any retries actually happening (or at least, not logged then).

What I see is that fetches slow down over time. Before it would crash, I see this kind of logs: npm timing reifyNode:node_modules/jsdom Completed in 23901ms

Why am I not seeing retries? I also changed the maxsockets config to 5, still not seeing any result.

My problem is strange, because my jobs not always fail! Even when increasing the maxsockets config to 50, I have the same, ~20% probability of the job failing due to timeout.

Anyone? Anything?? Would be appreciated very much.

melroy89 commented 2 months ago

I have the feeling I'm repeating myself. But this npm fix is not yet part of nodejs. See open PR until further notice: https://github.com/nodejs/node/pull/52505

Then update to nodejs v22. V16 is end of life.

sdalonzo commented 2 months ago

Hey everyone!

Just wanted to post here, maybe some of you might have an idea. I am using node v16.20.2, and npm v8.19.4.

I somehow get this error, when executing npm install with the help of a gitlab runner.

I don't have any idea what's going on... The error message says basically nothing, I don't see status codes other than 200 in the verbose logs. Is this truly a network issue? I have increased the fetch retries to 5, however, I don't see any retries actually happening (or at least, not logged then).

What I see is that fetches slow down over time. Before it would crash, I see this kind of logs: npm timing reifyNode:node_modules/jsdom Completed in 23901ms

Why am I not seeing retries? I also changed the maxsockets config to 5, still not seeing any result.

My problem is strange, because my jobs not always fail! Even when increasing the maxsockets config to 50, I have the same, ~20% probability of the job failing due to timeout.

Anyone? Anything?? Would be appreciated very much.

@IceBjerg try --no-progress. we had an issue with long urls to a private npm registry and a memory leak in the progress bar, and it had the notable symptom of hanging on reify