kubeguard / guard

🔑 Kubernetes Authentication & Authorization WebHook Server
https://kubeguard.dev
Apache License 2.0
590 stars 81 forks source link

Add retries and timeouts to Azure auth provider "get metadata" step #353

Closed AzureMarker closed 1 year ago

AzureMarker commented 1 year ago

We've seen issues where this "get metadata" step times out after 10s of minutes, by which time the HTTP server has already passed the write timeout and the API server returned Unauthorized since it didn't get a response from Guard.

This PR adds a timeout of 30 seconds and 3 retries to avoid failing due to transient network issues.

Also made a logging fix in the pop token code.

weinong commented 1 year ago

what is the server write timeout and timeout from api-server? Is 30s a total timeout from 3 retries? or is it from each call?

AzureMarker commented 1 year ago

what is the server write timeout and timeout from api-server?

The server write timeout is coming from Guard, not API server. It's configurable, but set to 10 seconds by default: https://github.com/kubeguard/guard/blob/09ab161fb1fb0862dba26c47a9b5c80c82a72b68/server/server.go#L54

Is 30s a total timeout from 3 retries? or is it from each call?

It's the timeout per-retry. I could use autorest.DoRetryForDuration to have a total duration time limit.

AzureMarker commented 1 year ago

Looks like you're right, I'm too used to working with async operations. Updated to include a timeout in the default HTTP client, but I can move that to only affect this code if requested.

I am ok moving this to only retry for this specific use case.

The retries are only for this getMetadata operation, but the timeout change currently affects everything using the HTTP client.

weinong commented 1 year ago

if it's 10s in server's write timeout, shouldn't the timeout be 3s for 3 times? besides, can we implement this short timeout just for that request?

weinong commented 1 year ago

if it's 10s in server's write timeout, shouldn't the timeout be 3s for 3 times? besides, can we implement this short timeout just for that request?

such as http.NewRequestWithContext

AzureMarker commented 1 year ago

Synced offline with Anubhuti - Changed timeout to only affect the getMetadata call.

if it's 10s in server's write timeout, shouldn't the timeout be 3s for 3 times? besides, can we implement this short timeout just for that request?

such as http.NewRequestWithContext

Reduced timeout to 3 seconds per attempt. The issue with using http.NewRequestWithContext is that the timeout is implemented via deadline, so the request will instantly time out on the second attempt if the first attempt was a timeout (deadline didn't change since autorest reuses the request). Setting the timeout on the HTTP client avoids this issue.

Anumita commented 1 year ago

Hey @AzureMarker/@weinong, we set the server write timeout to 25 seconds and server read timeout as 15 secs in aks

AzureMarker commented 1 year ago

@tamalsaha can you publish a new release so we can start using this change? Thanks!

AzureMarker commented 1 year ago

@tamalsaha may you publish a new release?