Closed blink1073 closed 1 month ago
Ooh nice, yeah, let's use that.
Can't we use curl's built in retry flag? It already uses exponential backoff by default
--retry <num>
If a transient error is returned when curl tries to perform a transfer, it retries this number of times before giving up. Setting the number to 0 makes curl do no retries (which is the default). Transient error means either: a timeout, an FTP 4xx response code or an HTTP 408, 429, 500, 502, 503 or 504 response code.
When curl is about to retry a transfer, it first waits one second and then for all forthcoming retries it doubles the waiting time until it reaches 10 minutes which then remains delay between the rest of the retries.
We were already using the retry flag on the node download, it was still erroring out for other reasons.
What reasons? perhaps we want to use --retry
plus --retry-all-errors
?
Oh --retry-all-errors
requires curl 7.71.0 released Jun 2020 which might be too new and not portable on our hosts.
That, and we're using this for other things that need to be retried.
This change is causing mongodb downloading to fail:
[2024/09/06 10:50:27.813] Installing server binaries...
[2024/09/06 10:50:27.813] retry_with_backoff: running 'curl http://downloads.10gen.com/linux/mongodb-linux-x86_64-enterprise-rhel70-5.0.26.tgz --output mongodb-binaries.tgz' - attempt n. 1 ...
[2024/09/06 10:50:27.815] % Total % Received % Xferd Average Speed Time Time Time Current
[2024/09/06 10:50:27.815] Dload Upload Total Spent Left Speed
[2024/09/06 10:50:28.164]
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
2 151M 2 3646k 0 0 10.2M 0 0:00:14 --:--:-- 0:00:14 10.2M
[2024/09/06 10:50:28.165] curl: (18) transfer closed with 155591652 bytes remaining to read
[2024/09/06 10:50:28.165] Command 'shell.exec' in function 'bootstrap mongo-orchestration' (step 1.1 of 2) failed: shell script encountered problem: exit code 18.
[2024/09/06 10:50:28.165] Finished command 'shell.exec' in function 'bootstrap mongo-orchestration' (step 1.1 of 2) in 24.96417095s.
[2024/09/06 10:50:28.165] Running task commands failed: running command: command failed: shell script encountered problem: exit code 18
[2024/09/06 10:50:28.165] Finished running task commands in 24.964357327s.
[2024/09/06 10:50:28.189] Task completed - FAILURE.
The problem is that the new retry_with_backoff function breaks when set -e
is enabled.
Let's revert for now
Hey @blink1073, this caught my eye, of course, the Node team suffers from the same intermittent failure. The folks on mongosh have a more powerful retry approach:
https://github.com/mongodb-js/mongosh/blob/57a904a1acb7821eee4e4cde98ff779b87c612b1/.evergreen/retry-with-backoff.sh#L13
Any interest in porting that here? Then the Node team can migrate to this copy of the script. 🙂