Closed dbasilio closed 1 year ago
We should analyse these timestamps above to work out if this timeout should be reduced. I'm not sure if that 60s is a timeout for the full file transfer or a timeout failing to connect, for no data received, etc.
Then we should check if we can retry such failures at least once.
We've a similar issue on our helm charts repo
WARN: Host error
{
"hostType": "github-releases",
"packageName": "norwoodj/helm-docs",
"err": {
"name": "TimeoutError",
"code": "ETIMEDOUT",
"timings": {
"start": 1661933000729,
"socket": 1661933000945,
"lookup": 1661933108895,
"error": 1661933108935,
"phases": {
"wait": 216,
"dns": 107950,
"total": 108206
}
},
"event": "request",
"message": "Timeout awaiting 'request' for 60000ms",
"stack": "RequestError: Timeout awaiting 'request' for 60000ms\n at ClientRequest.<anonymous> (/home/ubuntu/renovateapp/node_modules/got/dist/source/core/index.js:970:65)\n at Object.onceWrapper (events.js:520:26)\n at ClientRequest.emit (events.js:412:35)\n at ClientRequest.emit (domain.js:475:12)\n at ClientRequest.origin.emit (/home/ubuntu/renovateapp/node_modules/@szmarczak/http-timer/dist/source/index.js:43:20)\n at TLSSocket.socketErrorListener (_http_client.js:475:9)\n at TLSSocket.emit (events.js:400:28)\n at TLSSocket.emit (domain.js:475:12)\n at emitErrorNT (internal/streams/destroy.js:106:8)\n at emitErrorCloseNT (internal/streams/destroy.js:74:3)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)\n at Timeout.timeoutHandler [as _onTimeout] (/home/ubuntu/renovateapp/node_modules/got/dist/source/core/utils/timed-out.js:36:25)\n at listOnTimeout (internal/timers.js:559:11)\n at processTimers (internal/timers.js:500:7)",
"options": {
"headers": {
"user-agent": "Renovate Bot (GitHub App 2740)",
"accept": "application/vnd.github.machine-man-preview+json",
"authorization": "***********",
"content-type": "application/json",
"content-length": "621",
"accept-encoding": "gzip, deflate, br"
},
"url": "https://api.github.com/graphql",
"hostType": "github-releases",
"username": "",
"password": "",
"method": "POST",
"http2": false
}
}
}
created a fork at https://github.com/renovate-reproductions/17483-helm-charts-timeout
@viceice We've tried a timeout of 120s, and a concurrency limit of 2 individually with no change. Going to try both together
Both settings together don't seem to help, we're still getting the external-host-error more often than note. Anecdotally, it does seem to fail on "later" packages alphabetically
{
"hostType": "npm",
"matchHost": "https://registry.npmjs.org/",
"timeout": 120000,
"concurrentRequestLimit": 2
}
I'm going to use this issue as a general performance tuning issue. It seems some of the issues are environmental/app-related, because we can't replicate them when running locally yet they occur consistently in the hosted app. This may be because of:
Some useful links:
This tried so far:
RES_OPTIONS: 'ndots:3 retrans:1000 retry:3 rotate'
Since it's been running for a few days now with
{
"hostType": "npm",
"matchHost": "https://registry.npmjs.org/",
"timeout": 120000,
"concurrentRequestLimit": 2
}
I've noticed that it started hitting a timeout more than the external host error. We are still getting the occasional success but most often it fails.
Looking at the last "done" run, here are the starts:
"hostStats": {
"api.github.com": {
"requestCount": 36,
"requestAvgMs": 460,
"queueAvgMs": 0
},
"registry.npmjs.org": {
"requestCount": 103,
"requestAvgMs": 19024,
"queueAvgMs": 0
},
"xxx.jfrog.io": {
"requestCount": 4,
"requestAvgMs": 23116,
"queueAvgMs": 0
}
},
Even though concurrentRequestLimit
was configured, we see no queueAvgMs
for registry.npmjs.org
plus very high requestAvgMs
for both npm and jfrog.
I found the queue problem, see #17713. Please remove hostType
from your host rule
Updated my hostrule to:
{
"matchHost": "registry.npmjs.org/",
"concurrentRequestLimit": 2,
"timeout": 120000
,
(and then the same for our jfrog registry as well in a separate host rule.
We're still getting mostly external host errors
Try removing all limits again
No change
What's the details for the errors you get now? Eg ETIMEDOUT, ECONNRESET, etc. pasting details here would be helpful
External Host Error
{
"hostType": "npm",
"packageName": "@datadog/browser-logs",
"err": {
"name": "TimeoutError",
"code": "ETIMEDOUT",
"timings": {
"start": 1663081515075,
"socket": 1663081515076,
"lookup": 1663081519436,
"connect": 1663081519436,
"secureConnect": 1663081522929,
"upload": 1663081522931,
"response": 1663081523051,
"error": 1663081595475,
"abort": 1663081635691,
"phases": {
"wait": 1,
"dns": 4360,
"tcp": 0,
"tls": 3493,
"request": 2,
"firstByte": 120,
"total": 120616
}
},
"event": "request",
"message": "Timeout awaiting 'request' for 60000ms",
"stack": "RequestError: Timeout awaiting 'request' for 60000ms\n at ClientRequest.<anonymous> (/home/ubuntu/renovateapp/node_modules/got/dist/source/core/index.js:970:65)\n at Object.onceWrapper (node:events:628:26)\n at ClientRequest.emit (node:events:525:35)\n at ClientRequest.emit (node:domain:489:12)\n at ClientRequest.origin.emit (/home/ubuntu/renovateapp/node_modules/@szmarczak/http-timer/dist/source/index.js:43:20)\n at TLSSocket.socketErrorListener (node:_http_client:481:9)\n at TLSSocket.emit (node:events:513:28)\n at TLSSocket.emit (node:domain:489:12)\n at emitErrorNT (node:internal/streams/destroy:157:8)\n at emitErrorCloseNT (node:internal/streams/destroy:122:3)\n at processTicksAndRejections (node:internal/process/task_queues:83:21)\n at runNextTicks (node:internal/process/task_queues:65:3)\n at listOnTimeout (node:internal/timers:528:9)\n at Timeout.timeoutHandler [as _onTimeout] (/home/ubuntu/renovateapp/node_modules/got/dist/source/core/utils/timed-out.js:36:25)\n at listOnTimeout (node:internal/timers:561:11)\n at processTimers (node:internal/timers:502:7)",
"options": {
"headers": {
"user-agent": "Renovate Bot (GitHub App 2740)",
"accept": "application/json",
"accept-encoding": "gzip, deflate, br"
},
"url": "https://registry.npmjs.org/@datadog%2Fbrowser-logs",
"hostType": "npm",
"username": "",
"password": "",
"method": "GET",
"http2": false
},
"response": {
"statusCode": 200,
"statusMessage": "OK",
"headers": {
"date": "Tue, 13 Sep 2022 15:05:22 GMT",
"content-type": "application/json",
"transfer-encoding": "chunked",
"connection": "close",
"cf-ray": "74a1c7de2db7680a-SEA",
"access-control-allow-origin": "*",
"age": "77",
"cache-control": "public, max-age=300",
"etag": "W/\"3786e68d0270801401b11f9b4ac6d624\"",
"last-modified": "Tue, 13 Sep 2022 11:35:42 GMT",
"vary": "accept-encoding, accept",
"cf-cache-status": "HIT",
"x-amz-replication-status": "COMPLETED",
"server": "cloudflare",
"content-encoding": "gzip"
},
"httpVersion": "1.1"
}
}
}
And this is all the INFO level logs for the timeout I posted above
INFO: Repository started
{
"renovateVersion": "32.194.5"
}
INFO: Dependency extraction complete
{
"baseBranch": "master",
"stats": {
"managers": {
"dockerfile": {
"fileCount": 1,
"depCount": 1
},
"npm": {
"fileCount": 140,
"depCount": 4456
},
"nvm": {
"fileCount": 1,
"depCount": 1
}
},
"total": {
"fileCount": 142,
"depCount": 4458
}
}
}
WARN: Job timeout
Have you configured a 60s timeout?
We do not have a timeout configured anywhere in our config
We've had a 4 minute timeout configured for a number of days. Strange that your log shows 60s - when was it captured?
Note: the file in question shouldn't take so long either way
These are the logs for the last day, with no change to our config:
The errors were respectively:
All runs used renovate version 32.194.5. I also double confirmed that we are not specifying a timeout anywhere in our shared configs.
Strange, it seems to be inheriting the default 60s hostRules.timeout instead of the 240s rule we've set in the app. Any change if you then try to apply a longer timeout in your own config? e.g.
"hostRules": [{
"timeout": 240000
}]
We've had it running all weekend with a 4m timeout and this is what it looks like
The failures were (in order):
We've been running with the 4m change for a week now, we're still seeing intermittent failures but it's now completing often enough that we're getting PRs through so most of my concern has been alleviated.
Is there a problem with increasing the timeout further? Like 5 or 6 minutes? Is there a max on the timeout? I don't want to further increase the timeout if it's going to negatively impact the running of the app.
No problem with you trying a higher number. Something is clearly going wrong somewhere in the chain and causing rate limiting, but it's good if it eventually completes.
@dbasilio I'm seeing only occasional failures for your repo now, can you confirm the same from your side?
Only ~1 failure in the last day, for the record it was still ECONNRESET, and I'm surprised to see that Renovate did retry 2 times:
{"level":20,"msg":"GET https://registry.npmjs.org/@yarnpkg%2Fcore = (code=ECONNRESET, statusCode=-1 retryCount=2, duration=17249)","time":"2022-10-22T14:06:04.321Z"}
{"level":20,"msg":"GET https://registry.npmjs.org/@yarnpkg%2Ffslib = (code=ECONNRESET, statusCode=-1 retryCount=2, duration=17446)","time":"2022-10-22T14:06:04.515Z"}
We're still seeing a decent amount of failures, but it's completing frequently enough that we're at least getting PRs put up
We're seeing success often enough now that I think we can close this issue out Thanks for all the help!
How are you running Renovate?
Mend Renovate hosted app on github.com
If you're self-hosting Renovate, tell us what version of Renovate you run.
No response
Please select which platform you are using if self-hosting.
github.com
If you're self-hosting Renovate, tell us what version of the platform you run.
No response
Was this something which used to work for you, and then stopped?
It used to work, and then stopped
Describe the bug
We maintain a monorepo with 138 packages in them. We used to get renovate PRs for the repo, but it stopped working a while ago (not exactly sure when). I finally got around to checking the logs and it's failing with the error below. The package itself is not that important as it fails on a different package pretty frequently (usually it fails on the first one listed in devDependencies on the global package.json).
The error is not consistent as we've had three successful runs in the last 3 days, but even on successful runs we haven't been getting new PRs. I'm going to guess we're hitting some sort of limit on requests to npm and being throttled, but if that's the case, are there config options we can make to increase the timeout or paginate more or something?
Relevant debug logs
Logs
``` WARN: Host error { "hostType": "npm", "packageName": "react-phone-number-input", "err": { "name": "TimeoutError", "code": "ETIMEDOUT", "timings": { "start": 1661790262126, "socket": 1661790262127, "lookup": 1661790262130, "connect": 1661790267339, "secureConnect": 1661790267369, "upload": 1661790267369, "response": 1661790267420, "error": 1661790323264, "abort": 1661790341747, "phases": { "wait": 1, "dns": 3, "tcp": 5209, "tls": 30, "request": 0, "firstByte": 51, "total": 79621 } }, "event": "request", "message": "Timeout awaiting 'request' for 60000ms", "stack": "RequestError: Timeout awaiting 'request' for 60000ms\n at ClientRequest.Have you created a minimal reproduction repository?
No reproduction repository