scylladb / scylla-machine-image

Apache License 2.0
19 stars 26 forks source link

lib/scylla_cloud.py: change retry logic of curl() to exponential backoff #445

Closed syuu1228 closed 1 year ago

syuu1228 commented 1 year ago

Since IaaS services recommended to use exponential backoff logic when retries to call metadata services, we should do it in our scripts.

Referenced implementation in https://developers.google.com/analytics/devguides/reporting/core/v3/errors?hl=en

see https://docs.aws.amazon.com/general/latest/gr/api-retries.html

Related with scylladb/scylladb#13442

yaronkaikov commented 1 year ago

@syuu1228 Please post the verification job for this change

Also please rebase

syuu1228 commented 1 year ago

@yaronkaikov got unrelated GCE build error during running test job:

 ERROR: (gcloud.compute.images.add-labels) HTTPError 400: Invalid value for field 'labels': ''. Label value 'debug-scylla-5.3.0-dev-x86_64-2023-05-08T21-39-11' violates format constraints. The value can only contain lowercase letters, numeric characters, underscores and dashes. The value can be at most 63 characters long. International characters are allowed.

https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/next-machine-image/101/consoleFull

Maybe it's packer script bug on GCE build.

yaronkaikov commented 1 year ago

@syuu1228 wired, when running https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/next-machine-image/103/ it passed successfully

yaronkaikov commented 1 year ago

Running also https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/next-machine-image/104/

yaronkaikov commented 1 year ago

@syuu1228 I think I found a bug, you were right. i will try to figure it out

yaronkaikov commented 1 year ago

@syuu1228 I think I found a bug, you were right. i will try to figure it out

https://github.com/scylladb/scylla-pkg/pull/3406

yaronkaikov commented 1 year ago

@syuu1228 please rebase , my patch was merged

yaronkaikov commented 1 year ago

Verified with https://jenkins.scylladb.com/job/scylla-master/job/releng-testing/job/next-machine-image/188/

fgelcer commented 1 year ago

@yaronkaikov @syuu1228 , was this tested on GCE artifact image test? it seems to be failing, and @benipeled is stuck with this failures on master

yaronkaikov commented 1 year ago

@fgelcer yes https://jenkins.scylladb.com/job/scylla-master/job/releng-testing/job/next-machine-image/188/

fgelcer commented 1 year ago

@fgelcer yes https://jenkins.scylladb.com/job/scylla-master/job/releng-testing/job/next-machine-image/188/

so perhaps it was a fluke... seeing the last 7 builds, only 1 passed https://jenkins.scylladb.com/job/scylla-master/job/artifacts/job/artifacts-gce-image-test/

benipeled commented 1 year ago

dequeued due to GCE/AZURE artifact failures @syuu1228 please run full tests and investigate before re-merge