keymanapp / keyman

Keyman cross platform input methods system running on Android, iOS, Linux, macOS, Windows and mobile and desktop web
https://keyman.com/
Other
394 stars 109 forks source link

chore(common): add retry checks around `npm install` (and any other npm network activities) #10350

Closed mcdurdin closed 4 months ago

mcdurdin commented 9 months ago

@jahorton can you figure out why the Test: Language Modeling Layer (Common) build failed? https://build.palaso.org/viewLog.html?buildId=433516&buildTypeId=Keyman_Common_LMLayer_TestPullRequests

01:52:46 npm ERR! npm ERR! errno ECONNRESET

01:52:46 npm ERR! npm ERR! network Invalid response body while trying to fetch https://registry.npmjs.org/@parcel%2flogger: aborted

01:52:46 npm ERR! npm ERR! network This is a problem related to network connectivity.

01:52:46 npm ERR! npm ERR! network In most cases you are behind a proxy or have bad network settings.

This has been happening from time to time for the past few months. Not necessarily on that specific package - just the loss of connectivity when trying to retrieve a package.

jahorton commented 5 months ago

I did a little searching related to this and found something that may be of interest... but it does currently have limitations.

With recent-enough versions of npm, there's now a way to cache recent npm installs and prioritize use of the cached packages, rather than always going out and re-fetching them.

But... there's a bug in which version-bumping a package causes use of the related option to fail, despite saying prefer cache (rather than 'use cache exclusively'):

jahorton commented 5 months ago

Very relevant thread I found:

https://github.com/actions/runner-images/issues/3737

This comment in particular looks relevant: https://github.com/actions/runner-images/issues/3737#issuecomment-2037491602

It's worth noting that npm just closed a bug (pending release) where too many connections were being opened during install:

mcdurdin commented 5 months ago

npm 10.5.1. I think we should go ahead and update our build agents to this latest version pronto -- before we try to do 17.0-stable release. Failing due to ECONNRESET is frequent now.

mcdurdin commented 5 months ago

I am doing npm 10.5.1 upgrade on all build agents now. Not going @latest until after 17.0-stable releases. Just mitigating ECONNRESET bug.

mcdurdin commented 5 months ago

Note, after applying the update to 10.5.1, we still get ECONNRESET, e.g. on ba-bionic-64-ta (https://build.palaso.org/buildConfiguration/Keyman_Test_Common_Linux/460549):

11:08:45   [common/web/keyman-version] ## configure starting...
11:08:48   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-node-xml2js.git
11:08:48   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-restructure.git
11:08:48   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-restructure.git
11:08:52   npm WARN deprecated @npmcli/move-file@2.0.1: This functionality has been moved to @npmcli/fs
11:09:31   npm ERR! code 1
11:09:31   npm ERR! git dep preparation failed
11:09:31   npm ERR! command /home/bob/.nvm/versions/node/v18.16.0/bin/node /home/bob/.nvm/versions/node/v18.16.0/lib/node_modules/npm/bin/npm-cli.js install --force --cache=/home/bob/.npm --prefer-offline=false --prefer-online=false --offline=false --no-progress --no-save --no-audit --include=dev --include=peer --include=optional --no-package-lock-only --no-dry-run
11:09:31   npm ERR! npm WARN using --force Recommended protections disabled.
11:09:31   npm ERR! npm ERR! code ECONNRESET
11:09:31   npm ERR! npm ERR! errno ECONNRESET
11:09:31   npm ERR! npm ERR! network Invalid response body while trying to fetch https://registry.npmjs.org/@parcel%2fpackager-html: aborted
11:09:31   npm ERR! npm ERR! network This is a problem related to network connectivity.
11:09:31   npm ERR! npm ERR! network In most cases you are behind a proxy or have bad network settings.
11:09:31   npm ERR! npm ERR! network
11:09:31   npm ERR! npm ERR! network If you are behind a proxy, please make sure that the
11:09:31   npm ERR! npm ERR! network 'proxy' config is set properly.  See: 'npm help config'
11:09:31   npm ERR!
11:09:31   npm ERR! npm ERR! A complete log of this run can be found in:
11:09:31   npm ERR! npm ERR!     /home/bob/.npm/_logs/2024-05-07T04_09_02_790Z-debug-0.log
11:09:31   
11:09:31   npm ERR! A complete log of this run can be found in:
11:09:31   npm ERR!     /home/bob/.npm/_logs/2024-05-07T04_08_46_540Z-debug-0.log
11:09:31   [common/web/keyman-version] ## configure failed

The build is picking up node 18.16.0:

11:09:31   npm ERR! command /home/bob/.nvm/versions/node/v18.16.0/bin/node  [snip]

But node 18.19.0 is current with nvm:

bob@ba-bionic-64-ta:~$ nvm current
v18.19.0
bob@ba-bionic-64-ta:~$ npm --version
10.5.1
bob@ba-bionic-64-ta:~$ which npm
/home/bob/.nvm/versions/node/v18.19.0/bin/npm
bob@ba-bionic-64-ta:~$ /home/bob/.nvm/versions/node/v18.16.0/bin/node /home/bob/.nvm/versions/node/v18.16.0/lib/node_modules/npm/bin/npm-cli.js --version
9.5.1
bob@ba-bionic-64-ta:~$

Because ... $PATH is defined in buildAgent.properties:

env.PATH=/home/bob/.nvm/versions/node/v18.16.0/bin:/usr/local/bin:/usr/bin:/bin:/usr/lib/android-sdk/cmdline-tools/tools/bin
mcdurdin commented 5 months ago

So, just experienced ECONNRESET on ba-jammy-64-ta, which is on npm 10.5.1:

07:26:58   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-node-xml2js.git
07:26:59   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-restructure.git
07:26:59   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-restructure.git
07:27:01   npm WARN deprecated @npmcli/move-file@2.0.1: This functionality has been moved to @npmcli/fs
07:27:58   npm ERR! code 1
07:27:58   npm ERR! git dep preparation failed
07:27:58   npm ERR! command /usr/bin/node /usr/lib/node_modules/npm/bin/npm-cli.js install --force --cache=/home/bob/.npm --prefer-offline=false --prefer-online=false --offline=false --no-progress --no-save --no-audit --include=dev --include=peer --include=optional --no-package-lock-only --no-dry-run
07:27:58   npm ERR! npm WARN using --force Recommended protections disabled.
07:27:58   npm ERR! npm ERR! code ECONNRESET
07:27:58   npm ERR! npm ERR! errno ECONNRESET
07:27:58   npm ERR! npm ERR! network Invalid response body while trying to fetch https://registry.npmjs.org/@parcel%2freporter-tracer: aborted
07:27:58   npm ERR! npm ERR! network This is a problem related to network connectivity.
07:27:58   npm ERR! npm ERR! network In most cases you are behind a proxy or have bad network settings.
07:27:58   npm ERR! npm ERR! network
07:27:58   npm ERR! npm ERR! network If you are behind a proxy, please make sure that the
07:27:58   npm ERR! npm ERR! network 'proxy' config is set properly.  See: 'npm help config'
07:27:58   npm ERR!
07:27:58   npm ERR! npm ERR! A complete log of this run can be found in: /home/bob/.npm/_logs/2024-05-13T00_27_03_752Z-debug-0.log

And verifying the version, making sure we are using exactly the same call to npm as in the failed call above:

~$ /usr/bin/node /usr/lib/node_modules/npm/bin/npm-cli.js --version
10.5.1
jahorton commented 5 months ago

Dang, even still? Thought for sure there'd be some relation there.

jahorton commented 5 months ago

There are a disturbing amount of StackOverflow answers (such as https://stackoverflow.com/questions/71449279/how-to-resolve-npm-err-code-econnreset-while-installing-angular-cli) saying "just rewrite npm's registry to use the http:// version of the registry site instead of the https:// version." That's obviously not something we want in our CI setup, though.

I don't see anything (yet) about a ECONNRESET-specific exit code for npm, but even then, if this is the only reason we fall over during CI with npm ci... we can probably just set a temporary trap for npm errors. The issue is... we don't want to drop the already-existing error trap. https://stackoverflow.com/a/7287873, with interpretation, seems to provide a way forward to "trap juggle" by "storing" the original trap. We could capture, then unset the old trap... and then put it back in place once done.

mcdurdin commented 5 months ago

I don't see anything (yet) about a ECONNRESET-specific exit code for npm, but even then, if this is the only reason we fall over during CI with npm ci... we can probably just set a temporary trap for npm errors. The issue is... we don't want to drop the already-existing error trap. https://stackoverflow.com/a/7287873, with interpretation, seems to provide a way forward to "trap juggle" by "storing" the original trap. We could capture, then unset the old trap... and then put it back in place once done.

Just use || on the npm ci line:

npm ci || (...do the retry bits)

mcdurdin commented 5 months ago

npm on ba-win10-64-pp-602 was upgraded to 10.5.1 on 3 May 2024. On 9 May 2024, we got another failed build https://build.palaso.org/buildConfiguration/Keyman_Developer_Test/461792?buildTab=log&linesState=374&logView=flowAware&focusLine=5532

14:27:50   npm ERR! command C:\Program Files\nodejs\node.exe C:\Users\bob\AppData\Roaming\nvm\v18.17.0\node_modules\npm\bin\npm-cli.js install --force --cache=C:\Users\bob\AppData\Local\npm-cache --prefer-offline=false --prefer-online=false --offline=false --no-progress --no-save --no-audit --include=dev --include=peer --include=optional --no-package-lock-only --no-dry-run
14:27:50   npm ERR! npm WARN using --force Recommended protections disabled.
14:27:50   npm ERR! npm ERR! code ECONNRESET
14:27:50   npm ERR! npm ERR! errno ECONNRESET
14:27:50   npm ERR! npm ERR! network Invalid response body while trying to fetch https://registry.npmjs.org/@parcel%2fconfig-default: aborted
14:27:50   npm ERR! npm ERR! network This is a problem related to network connectivity.
14:27:50   npm ERR! npm ERR! network In most cases you are behind a proxy or have bad network settings.
14:27:50   npm ERR! npm ERR! network
14:27:50   npm ERR! npm ERR! network If you are behind a proxy, please make sure that the
14:27:50   npm ERR! npm ERR! network 'proxy' config is set properly.  See: 'npm help config'
14:27:50   npm ERR!
14:27:50   npm ERR! npm ERR! A complete log of this run can be found in: C:\Users\bob\AppData\Local\npm-cache\_logs\2024-05-10T07_25_11_553Z-debug-0.log