jupyterhub / kubespawner

Kubernetes spawner for JupyterHub
https://jupyterhub-kubespawner.readthedocs.io
BSD 3-Clause "New" or "Revised" License
531 stars 299 forks source link

k8s-apiserver requests' responses status code like 429 should trigger request retries #436

Open consideRatio opened 3 years ago

consideRatio commented 3 years ago

Bug description

We have try/except logic that catches various errors from the api-server when making requests. We should try be a bit more smart about them though as often these errors mean we should retry the attempt rather than raising an exception that leads to a 500 error raised by JupyterHub for a user starting a pod for example.

https://github.com/jupyterhub/kubespawner/blob/76ad4e9cef7db3f08046f560bba39285f627d4e7/kubespawner/spawner.py#L1830-L1834

Response status codes to consider

Here are k8s api-server response status codes that is recommended to be followed by a retry of the request according to the Kubernetes community documentation.

image

Related

This issue is the outcome of reviewing another pr, see: https://github.com/jupyterhub/kubespawner/pull/433#pullrequestreview-489436152

minrk commented 3 years ago

Makes sense, I think a general

async def retrying_api_request():
    try:
        return (await real_api_request()) or True # ensure truthy, or is the response always truthy?
    except ApiException as e:
        if e.status in RETRY_ERROR_CODES:
            # any special-handling of e.g. 429 with may have a specified retry-after header?
            log.warning(f"Error {e.status} in API request: {e}, retrying...")
            return False
        # let any other error raise
        raise

result = await exponential_backoff(retrying_api_request, ...)

wrapper to use for all API requests seems appropriate. This can probably be folded into (or replace) our current asynchronize utility.

yuvipanda commented 3 years ago

Replacing asynchronize with this sounds like a great idea!