npm / npm

This repository is moving to: https://github.com/npm/cli
http://npm.community
17.53k stars 3.03k forks source link

npm hangs under heavy network load on some connections #7862

Closed busches closed 6 years ago

busches commented 9 years ago

I'm using npm 2.7.4 with nodejs v0.12.2 on Windows 7x64 and I'm running into an odd issue. With the below package.json, when running npm install the process will hang after it runs node-gyp for ws for karma. But if I run npm install -ddd then it doesn't hang and installs correctly.

{
  "name": "npmBug",
  "version": "0.0.0",
  "dependencies": {},
  "devDependencies": {
    "browser-sync": "~1.7.1",
    "karma": "^0.12.31"
  }
}

If I run with just npm install -dd then it hangs at this line:

npm info install karma@0.12.31
npm info postinstall karma@0.12.31
npm verb unlock done using C:\Users\buschs1\AppData\Roaming\npm-cache\_locks\karma-f090bd27423c129a.lock for C:\Users\buschs1\Desktop\npmBug\node_modules\karma

I'm able to reproduce this on another machine (same setup, node, Win7x64, but doesn't have python, etc. setup for node-gyp) and it happens when using io.js v1.6.3 as well. Since it does install when changing logging settings in npm, I've opened the issue against npm, let me know if it belongs elsewhere. Please let me know if you need anymore information.

denouche commented 8 years ago

Is it solving all your problems ?

2015-07-10 13:15 GMT+02:00 Varun Palekar notifications@github.com:

I am also facing this problem, but now I am using http registry insted of https registry. You can also change by this:

npm config set registry http://registry.npmjs.org/

— Reply to this email directly or view it on GitHub https://github.com/npm/npm/issues/7862#issuecomment-120387186.

varunpalekar commented 8 years ago

@denouche Not all problem, but ya it solves many problem related to npm install hangs, etc. Make sure you have following entry in ~/.npmrc

registry=http://registry.npmjs.org
vshabelnyk commented 8 years ago

Ok so maybe its due to https. Because we should use https caching proxy.

mmarod commented 8 years ago

I had been using http previously when I was having the issue. Haven't tried for a few weeks though.

On Jul 10, 2015, at 10:13 AM, Вячеслав Шабельник notifications@github.com wrote:

Ok so maybe its due to https. Because we should use https caching proxy.

— Reply to this email directly or view it on GitHub.

BlueHotDog commented 8 years ago

any official npm response? seems like a lot of people having this issue..

HiDAl commented 8 years ago

I fix my problem installing one by one the packages:

RUN npm install -g npm && \
    npm install -g node-gyp && \
    npm install -g grunt && \
    npm install -g bower && \
    npm install -g grunt-cli && \
    npm update

Before, I was doing:


RUN npm install -g npm && \
    npm install -g -dd node-gyp \
        bower grunt grunt-cli && \
    npm update

Maybe, the race condition happens when you install many packages that they share dependencies.

EDIT: I repeat the process in the same computer (npm v2.11.3, node v0.12.7), and now fail :confused: .. I change my install line to:

RUN npm install -g npm && \
    npm install -g -d grunt && \
    npm install -g -d bower && \
    npm install -g -d grunt-cli && \
    npm update

and it works... I don't change anything, so I don't understand what happen

PD: Sorry my english

call-a3 commented 8 years ago

Also having this issue on Windows 8.1 using node 0.12.7 and npm 2.13.0 I'm not using any proxy and have tried switching to http registry instead of https.

varunpalekar commented 8 years ago

@HiDAl I also feel this problem is of race condition.

HiDAl commented 8 years ago

@varunpalekar Finally, I run npm config set registry http://registry.npmjs.org/ and everything was fine... I don't test on another computers, so I believe this can help

kevinfargason commented 8 years ago

+1 for npm config set registry http://registry.npmjs.org/solving my problem:

npm install hanging inside docker container debian:wheezy

RoryH commented 8 years ago

+1 seeing NPM issue here, and it's sporadic, with private npm repo over HTTP

thomassuckow commented 8 years ago

I've been working on adding prints to various locations in npm and found this interesting result today:

End Install function
install many top
install many top_
install many
asyncMap 0 with  39 items
npm WARN deprecated gulp-clean@0.3.1: use gulp-rimraf instead
asyncMap 0 Items left:  38
asyncMap 0 Items left:  37
[...]
asyncMap 0 Items left:  2
asyncMap 0 Items left:  1

The spinner just sits there continuously at this point. This is very early in the install process as only a single asyncMap has been created.

Unfortunately the adding of console output changes the probability of hitting the bug. I have to run it many times to get it to occur.

Well, on to digging deeper.

thomassuckow commented 8 years ago

I've managed to narrow it down to cache.add( ) not always calling the callback. Specifically addNamed is not always invoking the callback. I'm having to make more sophisticated debug output because addNamed has quite a few branches it can take.

wenerme commented 8 years ago

I cloned ghost, and npm install, I got a lot hang when

npm verb afterAdd

I have to interrupted and npm install again, then I hang at another npm verb afterAdd.

thomassuckow commented 8 years ago

I've tracked the hang to getOnceFromRegistry in cache/add-named.js

It isn't always the same package and it happens with differing call locations of getOnceFromRegistry. In my last several tests I have never had it hang outside getOnceFromRegistry which makes me believe that something unique to that process is causing it. My fear is that it is actually a bug in the server side of the npm registry, possibly holding a connection open. Especially since this all started for me without an npm update, though it could also be related to corporate transparent proxies.

mpderbec commented 8 years ago

Go thomassuckow go!! This bug suuuucks.

othiym23 commented 8 years ago

This is still something that the CLI team at npm is paying attention to, but there are few of us with a lot to do, and we're still suffering from a lack of reliable repro cases for this issue. To speak to @thomassuckow's analysis above, a few things could be happening:

  1. There's some kind of bug or race condition inside npm that's causing it to hang during more complex installs.
  2. There's some kind of network issue that causes a connection to hang open indefinitely that's causing npm to hang during more complex installs.
  3. There's some kind of network issue that causes a connection to hang or reset that triggers a bug or race condition inside npm that's causing it to hang during more complex installs.

My guess is that it's either 2 or 3. getOnceFromRegistry() is a simple wrapper around the registry client code which grabs the package tarball from the registry, and is mostly just doing network stuff. This is kind of unfortunate, because it leaves a large number of free variables to nail down before I can say what's happening with any confidence. My suspicion is that it's either one or more network connections hanging in a finished but unterminated state (which npm / request don't really deal with well), or some kind of lower-level TLS / TCP/IP issue.

Part of the reason that this issue hasn't gotten more attention from the team is that it seems to be affecting a relatively small number of users. This isn't to diminish the importance of the time any of you have lost to this issue (and, to be clear, I believe entirely that there's something to be fixed here, and that the responsibility probably lies on one or more of npm's teams to get it sorted out). It just means that nobody close to the team has run into this issue ourselves. It would be very helpful if we could start nailing down some of the areas of uncertainty around the problem:

mpderbec commented 8 years ago

Here are my answers:

• is anyone encountering this issue running Node 0.10 or io.js 2.x+? I'm running Node 0.12.7 and I don't appear to be using io.js anywhere that I can find.

• where are the systems encountering this issue (geographically) located?

Oakland, CA and White Salmon, WA

• do you know what content distribution network (CDN) point of presence (POP) the affected systems are connecting to? (npm uses Fastly for its CDN, and Fastly has POPs scattered all over the world.)

I don't know... are there ways to find out? A quick Google search didn't turn up anything obvious.

• how are affected systems connected to the internet?

In one case, the internet connection is relatively weak/flaky, and I'm seeing this problem about 90% of the time I attempt 'npm install'. I'll be in this location (White Salmon, WA) for three more days, if there's any more diagnosing I can do while I'm here please let me know.

• do those of you encountering this problem also see npm exiting with cb called twice errors?

I have seen that error, but not lately. Most of the time it is the "hung spinner" symptom.

thomassuckow commented 8 years ago

Node 0.12.1 NPM 2.7.1 This is the version I am adding my own debugging to.

Richland WA, but our internet has multiple trunks in different geographical locations

I am also using an internal Artifactory mirror, it fails without it too.

Non-authoritative answer: registry.npmjs.org canonical name = a.sni.fastly.net. Name: a.sni.fastly.net Address: 23.235.46.162

It's complicated (And I don't really know)

Nope

jsoverson commented 8 years ago

is anyone encountering this issue running Node 0.10 or io.js 2.x+?

Happens with Node 0.10, 0.12, and iojs 3.0.0

where are the systems encountering this issue (geographically) located?

Mountain View, Santa Clara, SF.

do you know what content distribution network (CDN) point of presence (POP) the affected systems are connecting to? (npm uses Fastly for its CDN, and Fastly has POPs scattered all over the world.)

NetRange    199.27.72.0 - 199.27.79.255
CIDR    199.27.72.0/21
NetName FASTLY
NetHandle   NET-199-27-72-0-1
Parent  NET199 (NET-199-0-0-0-0)
NetType Direct Assignment
OriginAS    AS54113
Organization    Fastly (SKYCA-3)
RegDate 2011-10-17
Updated 2012-03-02
Ref http://whois.arin.net/rest/net/NET-199-27-72-0-1
OrgName Fastly
OrgId   SKYCA-3
Address PO Box 78266
City    San Francisco
StateProv   CA
PostalCode  94107
Country US
RegDate 2011-09-16
Updated 2014-10-07
Ref http://whois.arin.net/rest/org/SKYCA-3
OrgAbuseHandle  ABUSE4771-ARIN
OrgAbuseName    Abuse Account
OrgAbusePhone   +1-415-496-9353
OrgAbuseEmail   abuse@fastly.com
OrgAbuseRef http://whois.arin.net/rest/poc/ABUSE4771-ARIN
OrgTechHandle   FRA19-ARIN
OrgTechName Fastly RIR Administrator
OrgTechPhone    +1-415-404-9374
OrgTechEmail    rir-admin@fastly.com
OrgTechRef  http://whois.arin.net/rest/poc/FRA19-ARIN

how are affected systems connected to the internet?

Home: Xfinity via netgear wireless router. Work: not sure, what are you looking for? Happens with artifactory mirror and without.

do those of you encountering this problem also see npm exiting with cb called twice errors?

Never seen that. Successful installs exit without error, unsuccessful installs hang forever.

othiym23 commented 8 years ago

@jsoverson

Happens with Node 0.10, 0.12, and iojs 3.0.0

Can you tell me with which versions of Node 0.10 you've seen the issue show up? And which version of npm it was using at the time?

NetRange  199.27.72.0 - 199.27.79.255
…

I meant, specifically, which IP addresses were you connecting to?

jsoverson commented 8 years ago

Can you tell me with which versions of Node 0.10 you've seen the issue show up? And which version of npm it was using at the time?

node v0.10.40 (npm v2.13.5)

I meant, specifically, which IP addresses were you connecting to?

199.27.79.162

thomassuckow commented 8 years ago

I do occasionally see: Could not download Node.js: Could not download http://nodejs.org/dist/v0.12.1/node-v0.12.1-linux-x64.tar.gz: Connect to nodejs.org:80 [nodejs.org/165.225.133.150] failed: Connection timed out

It has actually been happening a lot lately. May be unrelated.

Edit: I intend to continue going down the rabbit hole once I have/make time.

thomassuckow commented 8 years ago

I've traced the most common hang to cache/caching-client.js in _get(). In some cases _get succeeded but the app still hung (See note below).

Inside it calls this.request() but the callback is never invoked.

though this.request appears to actually be this._invalidatingRequest because of:

// swizzle in our custom cache invalidation logic
this._request = this.request
this.request  = this._invalidatingRequest

Note: Twice during my testing today it succeeded in npm.registry.get() but still hung. addNamed in cache/add-named.js still was waiting for "'gulp-filter': '2.0.2'". :suspect:

timotm commented 8 years ago

We see this a lot when running automated builds in a Debian Jessie machine with node 0.10.29 in Finland.

Curiously, when we run the same build in an identical Jessie machine running at Amazon EC2 (us-east-1), it doesn't reproduce.

mischkl commented 8 years ago

Still getting lots of freezing on "unlock done" on Mac OS X 10.9.5, node 0.12.7, npm 2.14.0, using Artifactory as repository provider... needless to say this is very frustrating and causes me an average of an hour downtime every day at the moment.

YanshuoH commented 8 years ago

+1 for this issue, with 2 different VM hosted by VirtualBox(4.3.12) in France

Linux local 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1 x86_64 GNU/Linux
node -v v0.12.7
npm -v 2.12.1
npm -v 2.14.0

(not working)

Linux vmdev 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 GNU/Linux
node -v v0.12.7
npm -v 2.12.1
npm -v 2.14.0

(not working)

Another amazon server does work

Linux ip-10-33-57-75 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux
node -v 0.11.12-pre
npm -v 1.4.3

So I tried to downgrade my two VM's npm version to 1.4.3, not working...

Finally a test in Windows 8.1, it does work

MINGW32_NT-6.2 ASAFINE-PC 1.0.12(0.46/3/2) 2012-07-05 14:56 i686 unknown
node -v v0.10.33
npm -v 1.4.28

To draw a conclusion for the above tests, maybe it's something wrong with VirtualBox. I'll test different version of VirtualBox and report later.

YanshuoH commented 8 years ago

Well, after upgrade VirtualBox to 4.3.30, problem resolved.

By the way, I got a lot of npm info retry will retry, error on last attempt: Error: socket hang up when "postinstall" occurs, as well as some "unlock" actions

othiym23 commented 8 years ago

@mischkl

Still getting lots of freezing on "unlock done" on Mac OS X 10.9.5, node 0.12.7, npm 2.14.0, using Artifactory as repository provider

The unlock done messages are a red herring, as they're simply messages from another portion of the install process indicating that they're finished. The fact that you're having difficulties with Artifactory is interesting, though. Does everything install from Artifactory, or is it a mix of Artifactory and registry.npmjs.org? What kind of host environment is the CI running under?

artemyarulin commented 8 years ago

Same here - latests node, npm and Mac OS. One thing that maybe would help to reproduce this issue - my internet connection is quite slow right now (around 5Mb, but unstable a bit).

mischkl commented 8 years ago

@othiym23 Thanks for taking the time to help analyze this. Everything installs via Artifactory, including packages from registry.npmjs.org (Artifactory acts as a mirroring proxy). We have a "virtual repository" setup that searches through three internal registries that represent various development "stages" before looking on registry.npmjs.org, while automatically caching the packages from npmjs.org. Unfortunately any theoretical performance boost from caching seems to be more than cancelled out by the searching through 4 registries.

Interestingly, the Jenkins CI, which is running on Debian Linux, as well as the colleagues using Ubuntu Linux, do not seem to be experiencing these issues. For both of these cases the npm install goes a lot faster (about 3 minutes as compared to upwards of 10) and never hangs. So it seems like this issue may be aggravated by the Mac OS network stack? (and/or file system stack? - I mention this because sometimes it seems to hang near the end of the install while trying to deal with lock files on the file system, then when I Ctrl-C it and run npm install again it finishes the job just fine.)

schnesim commented 8 years ago

I believe I know why the -ddd flag helps occasionally, at least the users behind a coporate proxy.

When calling npm install... npm generates an incredible amount of requests in a very short time and the proxy just get's clocked up. And least that's what's happening with my corporate proxy. The -ddd flag creates just enough chatter by writing to the console that the time gaps between the requests become big enough for the proxy to handle.

So I think what could solve this issue once and for all is the ability to make npm pause between two requests for certain time like for 1000ms via npm install -t1000 yo

vshabelnyk commented 8 years ago

Looks like working fine last couple weeks for me and all team. I dont know what happened yet.

thomassuckow commented 8 years ago

The last hang I have in Jenkins is on Aug 20th 2015. I just ran my looping testing script, it never hanged during the 20 minutes I ran it (typical time to hang was less than 1 min before)

skydogch commented 8 years ago

My npm hangs for around 15-20 min

$ uname -a Linux Merlin 3.19.0-15-generic #15-Ubuntu SMP Thu Apr 16 23:32:37 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

$ node -v v0.12.7

$ npm -v 2.11.3

aaronjensen commented 8 years ago

Our npm takes 5 minutes usually when run via teamcity, but when run via ssh it can finish in seconds:

$ uname -a
Linux id16784 3.8.0-44-generic #66~precise1-Ubuntu SMP Tue Jul 15 04:01:04 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

$ node -v
v0.10.40

$ npm -v
2.11.3

This is consistent and across multiple projects, also hangs on node 0.12.

Using the http registry does not help.

fabiosussetto commented 8 years ago

Experiencing the same problem within a Docker container (node:0.12.7-wheezy). Outside of the container, npm install works fine. I think I solved the issue by doing

RUN npm install -g node-gyp

before npm install.

I got the idea from this post: http://gangmax.me/blog/2013/05/13/resolve-npm-update-node-gyp-hung-problem/

anilmujagic commented 8 years ago

@fabiosussetto Installing node-gyp before ionic didn't help :(

thovden commented 8 years ago

Ran into this issue today and wasted a few hours. Running docker on a mac, which means running under VirtualBox. The magic sauce was @varunpalekar's suggestion to go http:

npm config set registry http://registry.npmjs.org/
anilmujagic commented 8 years ago

@thovden I have that set, but didn't help.

aaronjensen commented 8 years ago

fwiw, we had problems w/ using old style git urls: git://... replacing them w/ github urls fixed the issue for us.

gregthebusker commented 8 years ago

This issue also started happening for me yesterday. I'm on a mac with node 4.1.1. I haven't been able to fix it with any of the recommend hacks. I've just had to go back down to v2 to get things to work. I even tried npm 3.3.5 is see if that would help.

mojojoseph commented 8 years ago

3.3.4 hangs here too:

npm verb addTmpTarball /tmp/npm-26594-d4101c7c/localhost_8080/moment/-/moment-2.10.6.tgz not in flight; adding npm verb addTmpTarball already have metadata; skipping unpack for moment@2.10.6 npm verb afterAdd /root/.npm/blessed/0.1.61/package/package.json written npm verb afterAdd /root/.npm/moment/2.10.6/package/package.json not in flight; writing npm verb afterAdd /root/.npm/moment/2.10.6/package/package.json written loadDep:ikt ▐ ╢██████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░╟

Intermittent but much more likely to happen without -ddd. This is within qemu-arm-static.

jezell commented 8 years ago

Also having this issue, tried alternate registries and have same problem. Tried Node 4.1.1 and 10.40 doesn't seem to matter what version of Node. Really annoying.

Note: building inside docker in case that is related.

cmrose commented 8 years ago

Me too! Spent literally days on this - I have to get ionic/cordova installed.

Windows 7 nodejs v4.1.2 npm 2.14.4 git 2.6.1.windows.1 Python 2.7.10 No other modules installed. As this is all fresh nodejs install. Internet connection via corporate VPN http://amibehindaproxy.com/ says "No proxy detected" 317GB free disk space 10GB free RAM on 16GB machine

Sequence of commands at command prompt as administrator:

  1. npm cache clean
  2. Delete C:\Users\AppData\Roaming\npm*
  3. Delete C:\Users\node_modules
  4. npm config set loglevel silly
  5. npm config set registry https://registry.npmjs.org/
  6. npm update -g npm
  7. npm update
  8. Close command prompt and then reopen
  9. npm -g install cordova OR npm -g install ionic

Other things I have tried at various points: npm install -g Ubuntu 14 fresh install on another computer and another network connection not on VPN http://registry.npmjs.org/ https://registry.npmjs.org.eu/ https://registry.npmjs.org.au/ npm install -g node-gyp npm install -g sax : SUCCEEDED

Left it running for hours overnight

Every time it hangs up on a "npm verb unlock done using ...." log line

thomassuckow commented 8 years ago

@cmrose No amount of cleaning your environment will help. It appears to be some kind of bizarre network issue. Before I could no longer reproduce I tracked it down to https://github.com/npm/npm/issues/7862#issuecomment-133211875

You can run your own copy of npm that has been edited with node/node node/npm/bin/npm-cli.js install

I suspect it is an issue either with the servers hosting the repository or some misbehaving appliance in between. I don't trust amibehindaproxy, it says I am not behind a proxy but I am.

cmrose commented 8 years ago

Thanks Thomas. I'll try the connection I have that does not route through the corporate VPN again. I now recall that last time I tried that I may have had another VPN service operating.

cmrose commented 8 years ago

I reformatted and installed Ubuntu 14.04 on a spare laptop. Installed nodejs 4.1.2 and npm 2.14.4.

Then: sudo npm install -g cordova. This time it was not the hang problem but ECONNRESET. Bypassed router and switches direct to LAN port on router/modem and made sure there were no firewalls or port forwarding or other unusual stuff running on the router and the PC. This connection does not go through any VPNs (ie direct to ISP) and as far as I can tell has no proxy. Still got ECONNRESET for many tries. Then set log level to silly and the cordova install succeeded.

Then: sudo npm install -g ionic. More numerous retries resulted in ECONNRESET. Then separately installed the last module before the ECONNRESET . In this case cheerio. Then tried the ionic install and that succeeded.

Still have to duplicate this success on Windows though.

cmrose commented 8 years ago

Now I have successfully installed ionic and cordova on Windows 7 and Ubuntu 14.04 LTS. I can't be certain exactly what made this possible but the strongest candidates I have are:

  1. Removing the port forward in my router for yawcam (8888 for TCP traffic and port 8081 for TCP and UDP)
  2. Reinstalling the last module before the halt separately (in my case cherrio) then running the ionic install again. (This was done on Ubuntu only)
  3. Running installs with log level set to silly (perhaps introduces latency taking load off server?)
  4. Just wait awhile (or keep trying) until some upstream issue fixes itself.
  5. Some combination of the above

Hope this helps others and I would welcome any further insights that may lead to a more definitive answer.

iarna commented 8 years ago

Hey you all– when this happens with npm@3, does the spinner keep on spinning, or does it stop? (It has subtle, but important meaning!)