googleapis / google-auth-library-python

Google Auth Python Library
https://googleapis.dev/python/google-auth/latest/
Apache License 2.0
752 stars 301 forks source link

fix(metadata): enhance retry logic for metadata server access in _metadata.py #1545

Open mrvarmazyar opened 6 days ago

mrvarmazyar commented 6 days ago

This PR enhances the retry logic in the _metadata.py file to handle transient network issues more effectively when accessing the GCE metadata server. The existing retry logic in the ping and get functions have been replaced with the ExponentialBackoff class from _exponential_backoff.py, which introduces exponential backoff and jitter.

clundin25 commented 6 days ago

Hi @mrvarmazyar, thank you kindly for your contribution!

There exists an exponential backoff implementation in this repo. I think the loop should instead be refactored to use it https://github.com/googleapis/google-auth-library-python/blob/main/google/auth/_exponential_backoff.py.

This reduces new code + adds jitter to the retries.

mrvarmazyar commented 5 days ago

Hi @mrvarmazyar, thank you kindly for your contribution!

There exists an exponential backoff implementation in this repo. I think the loop should instead be refactored to use it https://github.com/googleapis/google-auth-library-python/blob/main/google/auth/_exponential_backoff.py.

This reduces new code + adds jitter to the retries.

@clundin25 Thank you for your feedback and for pointing me towards the ExponentialBackoff implementation. I have updated the _metadata.py file to use the ExponentialBackoff class for retry logic, reducing the new code and adding jitter to the retries as suggested. I also tested it, and it succeeded.

PS: there was also a miss typo in the contribution document

clundin25 commented 5 days ago

I think there are two actions remaining.

  1. Update the unit tests to mock out sleep so they don't take longer than necessar.
  2. There is a subtle bug in this change. ExponentialBackoff will sleep on the first iteration so this change would introduce ~1s of latency before the first request is sent.

I think ExponentialBackoff should be modified to not sleep on the first iteration, to avoid accidentally adding latency to requests.

Let me chat with my team on ExponentialBackoff. After that I will likely submit a patch to modify it's behavior.

mrvarmazyar commented 5 days ago

I think there are two actions remaining.

  1. Update the unit tests to mock out sleep so they don't take longer than necessary.

Thanks for the suggestion, I'll take a look to learn a bit of its unit tests and then will push the new changes.

  1. There is a subtle bug in this change. ExponentialBackoff will sleep on the first iteration so this change would introduce ~1s of latency before the first request is sent.

I think ExponentialBackoff should be modified to not sleep on the first iteration, to avoid accidentally adding latency to requests.

Let me chat with my team on ExponentialBackoff. After that I will likely submit a patch to modify it's behavior.

I've reviewed the code and understand your point. While you chat with your team, I'll work on proposing the change in the first iteration.

clundin25 commented 5 days ago

I pushed two patches here. Going to spend some more time thinking about this tomorrow.

mrvarmazyar commented 4 days ago

I pushed two patches here. Going to spend some more time thinking about this tomorrow.

@clundin25 I have reviewed your code and since you also made adjustments to the tests, should I make any other changes to my code?

clundin25 commented 4 days ago

@mrvarmazyar This LGTM. I will update the tests to mock out the calls to sleep.

Thank you for your contribution!