Open kachawla opened 3 weeks ago
:wave: @kachawla Thanks for filing this issue.
A project maintainer will review this issue and get back to you soon.
We also welcome community contributions! If you would like to pick this item up sooner and submit a pull request, please visit our contribution guidelines and assign this to yourself by commenting "/assign" on this issue.
For more information on our triage process please visit our triage overview
Another test run where this error happened on bucket deletion validation path: https://github.com/radius-project/radius/actions/runs/11269904213/job/31339494783
:+1: We've reviewed this issue and have agreed to add it to our backlog. Please subscribe to this issue for notifications, we'll provide updates when we pick it up.
We also welcome community contributions! If you would like to pick this item up sooner and submit a pull request, please visit our contribution guidelines and assign this to yourself by commenting "/assign" on this issue.
For more information on our triage process please visit our triage overview
:+1: We've reviewed this issue and have agreed to add it to our backlog. Please subscribe to this issue for notifications, we'll provide updates when we pick it up.
We also welcome community contributions! If you would like to pick this item up sooner and submit a pull request, please visit our contribution guidelines and assign this to yourself by commenting "/assign" on this issue.
For more information on our triage process please visit our triage overview
Seen again in a scheduled functional test run: https://github.com/radius-project/radius/actions/runs/11622220142/job/32367444436
Area for Improvement
AWS functional tests
Observed behavior
We have a functional test that creates an AWS S3 bucket, validates its existence by performing a
get
request, returns an error if the get call results in an error (including 404), and finally performs a cleanup step to delete the resources, even if the previous step returned an error. In a recent run, this test failed during the validation of the S3 bucket creation because the bucket could not be found. Logs -It errored out again during cleanup stage unable to validate deletion of the bucket -
Link to functional test run: https://github.com/radius-project/radius/actions/runs/11243305588/job/31259088130
Desired behavior
The test should always pass.
Proposed Fix
From the logs above, it's clear that the bucket was eventually created. So adding retries with backoff on the validation path here should help mitigate intermittent failures.
The second failure happened because the bucket wasn't deleted within the time allocated for the test validation. We should consider increasing the max retry limit here.
The backoff logic could be improved as well - currently there is a fixed 10 second forced wait between retries. We could start with a smaller wait time and exponentially increase it for cases where operations are delayed due to external issues. Since the buckets created for this test don't contain any objects, the creation and deletion should be fairly quick in most cases.
rad Version
N/A
Operating system
N/A
Additional context
No response
Would you like to support us?
AB#13454