nextstrain / nextstrain.org

The Nextstrain website
https://nextstrain.org
GNU Affero General Public License v3.0
87 stars 49 forks source link

CI failures when npm installing deps from GitHub #895

Closed jameshadfield closed 3 weeks ago

jameshadfield commented 3 weeks ago

Recent / ongoing build failures are observed during CI for this PR (full logs failure 1, failure 2) when installing Auspice's dependencies:

npm error code 1
npm error git dep preparation failed

however CI is passing for master branch. These errors are not reproducible for me when building the PRs branch in node20, but I haven't tested building them locally using the heroku buildpack container. The Heroku review app succeeds and is working as (I think) we don't yet in-house the building & deployment of these, which differs from how we deploy canary/production. I believe the salient difference (and hinted at in the error) is that we install auspice from a GitHub SHA for these automated testing PRs. @tsibley do you know more about what's happening here and if it is a known limitation of using build-packs the way we are? Hopefully it's as simple as adding some GitHub credentials / config to the build container.

tsibley commented 3 weeks ago

Hmm. Do you figure the error with Puppeteer is unrelated/a red herring?

npm error code 1
npm error git dep preparation failed
npm error command /build/app/.heroku/node/bin/node /build/app/.heroku/node/lib/node_modules/npm/bin/npm-cli.js install --force --cache=/tmp/npmcache.OQOXF --prefer-offline=false --prefer-online=false --offline=false --no-progress --no-save --no-audit --include=dev --include=peer --include=optional --no-package-lock-only --no-dry-run
npm error npm error code 1
npm error npm error path /tmp/npmcache.OQOXF/_cacache/tmp/git-cloneb8Kpvu/node_modules/puppeteer
npm error npm error command failed
npm error npm error command sh -c node install.js
npm error npm error ERROR: Failed to set up Chrome r114.0.5735.90! Set "PUPPETEER_SKIP_DOWNLOAD" env variable to skip download.
npm error npm error [Error: EACCES: permission denied, mkdir '/.cache'] {
npm error npm error   errno: -13,
npm error npm error   code: 'EACCES',
npm error npm error   syscall: 'mkdir',
npm error npm error   path: '/.cache'
npm error npm error }
tsibley commented 3 weeks ago

It looks to me like npm tries to install Auspice (from git) and in that context fails to install Puppeteer. But presumably it's able to install Puppeteer when installing Auspice from NPM?

jameshadfield commented 3 weeks ago

I figured unrelated because the errors weren't reproducible for me when building the failing branch locally, but 🤷

jameshadfield commented 3 weeks ago

I was able to reproduce this error within the heroku-20 container + nodejs build-pack and upon reflection it probably is at the puppeteer stage. I was able to successfully use that approach to build the slug on master (and indeed our CI / deployment is working), and master now includes the updated auspice code which the failing branch pointed to. So it seems the error is not related to any specific auspice code, rather it appears when installing auspice from git rather than npm, and only within the context of our in-house heroku-build pipeline. (Notably the Heroku review app, which uses the same (?) container + build-pack approach as we do but they manage the build completed successfully.)

jameshadfield commented 3 weeks ago

Setting PUPPETEER_SKIP_DOWNLOAD='true' (as the error suggests) allows the build to proceed. Which is fine as we don't run Auspice's test suite within the context of nextstrain.org.

That the failure is observed within our in-house slug generation and not Heroku's (via its construction of the review app) indicates a difference in the two implementations. Whether it's worth diving into is another matter.

tsibley commented 3 weeks ago

That the failure is observed within our in-house slug generation and not Heroku's (via its construction of the review app) indicates a difference in the two implementations.

My quick guess would be HOME (or similar) is not set for the build process and proximate error source, mkdir '/.cache', is because something is actually trying to do mkdir $HOME/.cache (or similar). I may look more tomorrow.

tsibley commented 3 weeks ago

I repro'd a variant of this locally:

          Creating an optimized production build ...
          Downloading swc package @next/swc-linux-x64-gnu...
 âš  Found lockfile missing swc dependencies, run next locally to automatically patch
unhandledRejection [Error: EACCES: permission denied, mkdir '/.cache'] {
  errno: -13,
  code: 'EACCES',
  syscall: 'mkdir',
  path: '/.cache'
}

I made sure to rm -rf build/pack to get the latest version, which is 555fe89fb0c259e914681fa330a6aa9c19c34a00, and that's what's used in CI.

tsibley commented 3 weeks ago

It's next-swc trying to write to $HOME/.cache/next-swc/

tsibley commented 3 weeks ago

To store automatically downloaded binary extensions for Node.

jameshadfield commented 3 weeks ago

Thanks for fixing @tsibley

Is there a reason why "It's next-swc trying to write to $HOME/.cache/next-swc/" only occurs when installing via GitHub that you found out? Or did you just debug the failure mode and call it a day (entirely understandable)

tsibley commented 2 weeks ago

@jameshadfield The next-swc variant of this issue and your original Puppeteer variant of the issue are related (both are trying to access $HOME/.cache) but not triggered by the same thing.

In the next-swc case, it's because the pinned deps for Linux (but not macOS) were accidentally removed from the lockfile in #906 and so next build attempted to download them into $HOME/.cache/next-swc/.

In the Puppeteer case, it also attempts install-time downloads (of Chromium) but conditions those on some nuances of npm that I didn't dig into. Puppeteer is an Auspice dep and I suspect the difference between installing Auspice from NPM vs. a local source checkout (via git+ssh://… in package.json) causes Puppeteer's download conditional to evaluate differently.

tsibley commented 2 weeks ago

Setting HOME=/tmp fixes both triggers of the issue (even if ideally we'd never be losing next-swc deps from the lock file anyway).

jameshadfield commented 2 weeks ago

Thanks! I didn't click on the "variant" link to realise you were debugging a completely separate bug.

For what it's worth PUPPETEER_SKIP_DOWNLOAD='true' will speed up install a bit I imagine, and we don't use it in the nextstrain.org context, so something to consider?