mongodb-labs / drivers-evergreen-tools

Scripts for MongoDB drivers to bootstrap their Evergreen configuration file - This Repository is NOT a supported MongoDB product
10 stars 64 forks source link

PYTHON-4734 Make node download more robust #488

Closed blink1073 closed 1 month ago

nbbeeken commented 1 month ago

Hey @blink1073, this caught my eye, of course, the Node team suffers from the same intermittent failure. The folks on mongosh have a more powerful retry approach:

https://github.com/mongodb-js/mongosh/blob/57a904a1acb7821eee4e4cde98ff779b87c612b1/.evergreen/retry-with-backoff.sh#L13

Any interest in porting that here? Then the Node team can migrate to this copy of the script. 🙂

blink1073 commented 1 month ago

Ooh nice, yeah, let's use that.

ShaneHarvey commented 1 month ago

Can't we use curl's built in retry flag? It already uses exponential backoff by default

--retry <num>

If a transient error is returned when curl tries to perform a transfer, it retries this number of times before giving up. Setting the number to 0 makes curl do no retries (which is the default). Transient error means either: a timeout, an FTP 4xx response code or an HTTP 408, 429, 500, 502, 503 or 504 response code.

When curl is about to retry a transfer, it first waits one second and then for all forthcoming retries it doubles the waiting time until it reaches 10 minutes which then remains delay between the rest of the retries.

https://curl.se/docs/manpage.html

blink1073 commented 1 month ago

We were already using the retry flag on the node download, it was still erroring out for other reasons.

ShaneHarvey commented 1 month ago

What reasons? perhaps we want to use --retry plus --retry-all-errors?

ShaneHarvey commented 1 month ago

Oh --retry-all-errors requires curl 7.71.0 released Jun 2020 which might be too new and not portable on our hosts.

blink1073 commented 1 month ago

That, and we're using this for other things that need to be retried.

ShaneHarvey commented 1 month ago

This change is causing mongodb downloading to fail:

[2024/09/06 10:50:27.813] Installing server binaries...
[2024/09/06 10:50:27.813] retry_with_backoff: running 'curl http://downloads.10gen.com/linux/mongodb-linux-x86_64-enterprise-rhel70-5.0.26.tgz --output mongodb-binaries.tgz' - attempt n. 1 ...
[2024/09/06 10:50:27.815]   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
[2024/09/06 10:50:27.815]                                  Dload  Upload   Total   Spent    Left  Speed
[2024/09/06 10:50:28.164] 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  2  151M    2 3646k    0     0  10.2M      0  0:00:14 --:--:--  0:00:14 10.2M
[2024/09/06 10:50:28.165] curl: (18) transfer closed with 155591652 bytes remaining to read
[2024/09/06 10:50:28.165] Command 'shell.exec' in function 'bootstrap mongo-orchestration' (step 1.1 of 2) failed: shell script encountered problem: exit code 18.
[2024/09/06 10:50:28.165] Finished command 'shell.exec' in function 'bootstrap mongo-orchestration' (step 1.1 of 2) in 24.96417095s.
[2024/09/06 10:50:28.165] Running task commands failed: running command: command failed: shell script encountered problem: exit code 18
[2024/09/06 10:50:28.165] Finished running task commands in 24.964357327s.
[2024/09/06 10:50:28.189] Task completed - FAILURE.

https://spruce.mongodb.com/task/mongo_python_driver_tests_python_version_supports_openssl_102_test_ssl__platform~rhel7_auth_ssl~auth_ssl_python_version~3.8_test_5.0_replica_set_6bdaf19c78b6062dbf2e51513f24b941352f76c3_24_09_06_17_46_10/logs?execution=0

ShaneHarvey commented 1 month ago

The problem is that the new retry_with_backoff function breaks when set -e is enabled.

blink1073 commented 1 month ago

Let's revert for now