Closed dylang closed 9 years ago
Ran the same install on Jenkins and my mbp and it only happened on Jenkins.
Got the same error with 2.1.2
.
Note, for the silly
log above you'll need to git checkout
the gist
.
git checkout https://gist.github.com/78436df3febaf913ee75.git
Exxxcellent. I look forward to analyzing it with avid anticipation. Thank you!
Hot damn, the new error-checking I added has already earned its keep:
npm ERR! install trying to install 1.2.12 to /mnt/workspace/x-web-widgets/DYLAN-widget-neighbor-comparison/node_modules/angular-bundle
npm ERR! install but already installed versions [ '1.2.12' ]
Elsewhere:
npm sill install resolved peerDependencies:
npm sill install resolved { 'x-web-global-common-bundle': '^1.2.0',
npm sill install resolved 'angular-bundle': '1.2.12' },
This is totally a peerDependency problem. Now I just have to figure out how, and to fix it. This is the greatest night of the last few weeks.
This is the greatest night of the last few weeks.
:sweat_smile:
Knowing where a problem like this is makes it so much easier to solve, especially when I have a transcript I know how to read telling me how it got to that state.
However your ops team is configuring nginx in front of your npm-registry-couchapp
instance, it appears to be stripping the etag headers from the request, preventing CouchDB from sending 304s in response.
I spent about half the night learning how to use and configure chef
, vagrant
, and nginx
to see if I could get the cache settings fixed, but my 1 year old enjoys being up through the night even more than I do, another indeterminate bug I've been trying to solve. :smirk: Hope to get the cache settings fixed today and rolled out early next week. I'll let you know if it helps with the issues.
Registry cachebusting (which is what nginx is causing to happen right now) can exacerbate existing race conditions, but it's not the root cause of this bug. The reasons to do this are that you'll have less internal network traffic and builds will run faster.
If you get a chance, would it be a tremendous pain to check out the othiym23/inflight-uris-not-specs
branch on one of the jenkins instances that's been having trouble and get me another log? That branch changes how it tracks requests in flight and has additional / tweaked logging.
I can see the race condition happening in the log you captured yesterday, but I can't see the cause from the logs itself. It's possible this new version will fix your builds, but my primary goal is to get more insight into what's happening.
sure thing, trying it now! I'm able to get the error on my laptop so it's even easier to report a log to you. Do you want the npm-debug.log
and the entire silly log, or is one of the two enough?
Just the silly log, please – it's a strict superset of what's in npm-debug.log
.
Is the logging the only change? With that branch I just had my first successful npm install in a while with this particular repo.
Also, do you want the log of a successful install?
@dylang The change to how inflight
tracks which requests are pending will definitely make the caching process less chaotic. I'd be delighted if that fixed your build issues, but also a little suspicious, because it just minimizes the number of requests pending to the registry (i.e. reduces the chances of hitting race conditions) rather than making any deep changes to control flow (that would actually remove the race conditions).
Also, do you want the log of a successful install?
Only if it comes paired with a failed install so I can compare the two. ;)
@othiym23 - I have a delightful pairing for you.
This gist contains two npm install
logs using the same package.json
Also:
othiym23/inflight-uris-not-specs
branch.nginx
so that tarballs will be client cacheable. They had etags
before, but not a cache-control
header. Please let us know if you think we still aren't caching them.I like that the build on Jenkins is crashing at the same place on the same dependency. This means that it's a marginally deterministic race condition, which also means that it's probably findable / fixable. I may have another instrumented build for you next week. Enjoy your weekend!
@dylang Can you run this on the build server and tell me if it fixes the problem?
$ npm config set cache-lock-stale 3600000
$ npm config set cache-lock-wait 20000
$ npm config set cache-lock-retries 20
Here is a very small excerpt from the logs you sent me. There are no timestamps, so I can't be sure whether what's happening is a bug or a staleness timeout, but:
npm verb lock using /mnt/npm-cache/_locks/angular-bundle-ea67d6865b9164cd.lock for /mnt/workspace/x-web-widgets/DYLAN-test-bill-compare/node_modules/angular-bundle
npm verb lock using /mnt/npm-cache/_locks/angular-bundle-ea67d6865b9164cd.lock for /mnt/workspace/x-web-widgets/DYLAN-test-bill-compare/node_modules/angular-bundle
npm verb unlock done using /mnt/npm-cache/_locks/angular-bundle-ea67d6865b9164cd.lock for /mnt/workspace/x-web-widgets/DYLAN-test-bill-compare/node_modules/angular-bundle
The only way that you could see two acquisitions of the lock in a row without an unlock in between is either a bug in the lockfile
module or the lock timing out as stale. Since the default stale timeout is a minute, this is entirely conceivable in a big, complicated build like yours, so I'm just boosting the limits for everything by quite a bit to see what happens.
This still leaves the issue of determining why two installs of the same module are happening to the same place during the same run, but that is less of a pressing issue if the build isn't actually breaking.
Found it. Fixed it.
Not only was there a race condition in the installer (and a pretty obvious one), but the problem did in fact involve craziness with peerDependencies
. See this comment for details, but the short version is that when you have a peerDependency
that isn't being directly depended upon by any of your packages, but is declared as a peerDependency
by multiple packages at the same level of the tree, it will be queued up for installation multiple times.
Since we lock on install (mostly to keep multiple install processes from stomping on the same node_modules
directory), you'll be protected from this in most cases, but in the case of your build servers, they take long enough that they exceeds the stale lock timeout, which is a minute.
The fix for all of this is pretty simple, it just replaces the broken custom tracking logic for ensuring only one thing is installing to a given folder with the same inflight method the cache uses to avoid wasting time. I came up with a repro case of my own and have gone over it pretty closely, so I'm confident that this fixes this problem, which has just been sitting there waiting to nail people for quite some time now.
I'm confident enough that I've finally dealt with this that I'm going to close this issue; check out othiym23/peer-deps-must-die
to test the fix on your build servers. Assuming it does in fact fix the build problems you've been seeing, I'll merge the branch and make a new release early next week.
You can also address this issue by adding angular-bundle
to the package.json
of the build that's been failing, but what's the fun in that?
Wow, awesome! Amazing work!
$ npm config set cache-lock-stale 3600000
$ npm config set cache-lock-wait 20000
$ npm config set cache-lock-retries 20
This fixed a bunch of repos, but caused (or exposed) a new error with a few of them. This is using 2.1.3
, I haven't tried your latest build yet.
npm ERR! missing: minimatch@~0.2, required by karma@0.12.24
npm ERR! missing: http-proxy@~0.10, required by karma@0.12.24
npm ERR! missing: mime@~1.2.11, required by karma@0.12.24
npm ERR! missing: source-map@~0.1.31, required by karma@0.12.24
npm ERR! missing: readable-stream@~1.0.2, required by log4js@0.6.21
npm ERR! missing: cookie@0.1.0, required by connect@2.12.0
npm ERR! missing: negotiator@0.3.0, required by connect@2.12.0
npm ERR! missing: multiparty@2.2.0, required by connect@2.12.0
npm ERR! missing: mime@~1.2.9, required by send@0.1.4
npm ERR! missing: range-parser@0.0.4, required by send@0.1.4
These are all things that are in the tree multiple times.
For example
jenkins@ip-10-40-64-233:~/workspace/x-web-widgets/api-neighbor-comparison-MASTER$ npm ls minimatch
npm info it worked if it ends with ok
npm verb cli [ 'node', '/home/jenkins/.npm/bin/npm', 'ls', 'minimatch' ]
npm info using npm@2.1.3
npm info using node@v0.10.26
npm verb node symlink /usr/bin/node
api-neighbor-comparison@1.4.1-0 /mnt/workspace/x-web-widgets/api-neighbor-comparison-MASTER
├─┬ grunt@0.4.5
│ ├─┬ findup-sync@0.1.3
│ │ └─┬ glob@3.2.11
│ │ └── minimatch@0.3.0
│ └── minimatch@0.2.14
├─┬ grunt-cli@0.1.13
│ └─┬ findup-sync@0.1.3
│ └─┬ glob@3.2.11
│ └── minimatch@0.3.0
└─┬ x-web-taskmaster@27.1.1
├─┬ glob@3.2.11
│ └── minimatch@0.3.0
├─┬ grunt-build-bubble@1.7.7
│ └─┬ transform-package@1.1.0
│ ├─┬ fstream-npm@0.1.8
│ │ └─┬ fstream-ignore@1.0.1
│ │ └── minimatch@1.0.0
│ └─┬ x-web-npm@2.1.5
│ ├─┬ glob@4.0.6
│ │ └── minimatch@1.0.0
│ └─┬ npm@2.1.2
│ └── minimatch@1.0.0
├─┬ grunt-contrib-jshint@0.10.0
│ └─┬ jshint@2.5.6
│ └── minimatch@1.0.0
├─┬ grunt-contrib-watch@0.6.1
│ └─┬ gaze@0.5.1
│ └─┬ globule@0.1.0
│ └── minimatch@0.2.14
├─┬ grunt-jscs-checker@0.4.4
│ └─┬ jscs@1.4.5
│ └── minimatch@0.2.14
├─┬ grunt-mocha-test@0.10.2
│ └─┬ mocha@1.18.2
│ └─┬ glob@3.2.3
│ └── minimatch@0.2.14
├─┬ grunt-object-storage@2.2.1
│ ├─┬ glob@4.0.6
│ │ └── minimatch@1.0.0
│ ├─┬ object-storage-lookup@3.3.0
│ │ └─┬ glob@3.2.11
│ │ └── minimatch@0.3.0
│ └─┬ transform-package@1.1.0
│ ├─┬ fstream-npm@0.1.8
│ │ └─┬ fstream-ignore@1.0.1
│ │ └── minimatch@1.0.0
│ ├─┬ glob@3.2.11
│ │ └── minimatch@0.3.0
│ └─┬ x-web-npm@2.1.5
│ ├─┬ glob@4.0.6
│ │ └── minimatch@1.0.0
│ └─┬ npm@2.1.2
│ └── minimatch@1.0.0
├─┬ grunt-task-test-harness@1.0.6
│ └─┬ transform-package@1.1.0
│ ├─┬ fstream-npm@0.1.8
│ │ └─┬ fstream-ignore@1.0.1
│ │ └── minimatch@1.0.0
│ └─┬ x-web-npm@2.1.5
│ ├─┬ glob@4.0.6
│ │ └── minimatch@1.0.0
│ └─┬ npm@2.1.2
│ └── minimatch@1.0.0
├─┬ grunt-theme-firm-job@10.0.5
│ ├─┬ sassage-factory@3.2.1
│ │ ├─┬ glob@4.0.6
│ │ │ └── minimatch@1.0.0
│ │ ├─┬ node-sass@0.9.4
│ │ │ └─┬ mocha@1.18.2
│ │ │ └─┬ glob@3.2.3
│ │ │ └── minimatch@0.2.14
│ │ └─┬ stylestats@4.1.4
│ │ └─┬ stylus@0.48.1
│ │ └─┬ glob@3.2.11
│ │ └── minimatch@0.3.0
│ ├─┬ transform-package@1.1.0
│ │ └─┬ fstream-npm@0.1.8
│ │ └─┬ fstream-ignore@1.0.1
│ │ └── minimatch@1.0.0
│ └─┬ x-web-npm@2.1.5
│ ├─┬ glob@4.0.6
│ │ └── minimatch@1.0.0
│ └─┬ npm@2.1.2
│ └── minimatch@1.0.0
├─┬ grunt-theme-with-bubble@1.1.0
│ ├─┬ transform-package@1.1.0
│ │ └─┬ fstream-npm@0.1.8
│ │ └─┬ fstream-ignore@1.0.1
│ │ └── minimatch@1.0.0
│ └─┬ x-web-npm@2.1.5
│ ├─┬ glob@4.0.6
│ │ └── minimatch@1.0.0
│ └─┬ npm@2.1.2
│ └── minimatch@1.0.0
├─┬ matchdep@0.3.0
│ └─┬ globule@0.1.0
│ └── minimatch@0.2.14
└─┬ mocha@1.21.4
└─┬ glob@3.2.3
└── minimatch@0.2.14
npm verb exit [ 0, true ]
npm info ok
I'm trying the same package above with othiym23/peer-deps-must-die
and I'll post a silly
log if it has the same missing modules problem.
Good news! I re-ran a failed install (one that was missing dependencies) with othiym23/peer-deps-must-die
and all dependencies were present.
Doh... I'm seeing missing dependencies with othiym23/peer-deps-must-die
too.
I'm running it again now to see if it happens again.
I ran it two more times and it had the same missing packages.
hahahahahahahaha sobs
Well, the good news is that this is a separate issue. The bad news is that it's probably a consequence of the last fix related to just allowing multiple installs to piggyback on one actual write of a module. I'll see what I can sift out from the logs (thanks for uploading them).
See #6401 for a continuation of the missing packages discussion.
npm-debug.log
Since that's truncated I also have the full
silly
log as well. I'll update this ticket when I have it in a gist, it's big.Edit: Here's the silly log. npm@2.1.3 ENOENT 264,673-line log.