k8s-apiserver requests' responses status code like 429 should trigger request retries

consideRatio commented 3 years ago

Bug description

We have try/except logic that catches various errors from the api-server when making requests. We should try be a bit more smart about them though as often these errors mean we should retry the attempt rather than raising an exception that leads to a 500 error raised by JupyterHub for a user starting a pod for example.

https://github.com/jupyterhub/kubespawner/blob/76ad4e9cef7db3f08046f560bba39285f627d4e7/kubespawner/spawner.py#L1830-L1834

Response status codes to consider

Here are k8s api-server response status codes that is recommended to be followed by a retry of the request according to the Kubernetes community documentation.

async def retrying_api_request():
    try:
        return (await real_api_request()) or True # ensure truthy, or is the response always truthy?
    except ApiException as e:
        if e.status in RETRY_ERROR_CODES:
            # any special-handling of e.g. 429 with may have a specified retry-after header?
            log.warning(f"Error {e.status} in API request: {e}, retrying...")
            return False
        # let any other error raise
        raise

result = await exponential_backoff(retrying_api_request, ...)

wrapper to use for all API requests seems appropriate. This can probably be folded into (or replace) our current asynchronize utility.

yuvipanda commented 3 years ago

Replacing asynchronize with this sounds like a great idea!

jupyterhub / kubespawner

k8s-apiserver requests' responses status code like 429 should trigger request retries #436

Bug description

Response status codes to consider

Related