googleapis / python-api-core

https://googleapis.dev/python/google-api-core/latest
Apache License 2.0
112 stars 82 forks source link

Retry RESOURCE_EXHAUSTED errors #337

Open yan-hic opened 2 years ago

yan-hic commented 2 years ago

We are using a cloud function that, among others, calls DocumentAI. Assuming the latter has implemented the Retry dependency correctly, we have the following code that should retry when hitting a (retriable) quota limit:

result = doc_ai.process_document(
    request={"name": PROCESSORS[processor_key],
             "raw_document": {"content": self.content,
                              "mime_type": "application/pdf"}},
    retry=retry.Retry(
        predicate=retry.if_exception_type(exceptions.ResourceExhausted),
        deadline=300)
)
document = result.document

However, the call is not retried as cloud logging shows: image

Any clue ?

busunkim96 commented 2 years ago

Hi @yiga2,

RESOURCE_EXHAUSTED is not an error code the client library will retry automatically. The rationale is that immediately retrying a call when quota is exhausted isn't guaranteed to succeed and retrying might still cause billing charges. https://google.aip.dev/194 describes automatic retry configurations for our APIs.

The only error considered safe to retry across the board is UNAVAILABLE indicating a transient connection error. Document AI also includes DEADLINE_EXCEEDED in its retryable error codes for a few methods (see configuration).

Is it possible the call is retried too quickly and fails as the quota hasn't yet been reset? You could try tweaking the initial and multiplier arguments on the Retry object to ensure enough time has passed for the requests per minute quota to reset. The default retry object has an initial delay of 1 second and a multiplier of 2.

https://github.com/googleapis/python-api-core/blob/34ebdcc251d4f3d7d496e8e0b78847645a06650b/google/api_core/retry.py#L72-L75

yan-hic commented 2 years ago

I understand the error is not retried automatically but the code snippet above makes it retry, and I checked and DocAI does not bill for incomplete calls.

@busunkim96 You raised a good point on the quota reset as I misread the limit - the quota is per min, not per sec (which is more common). As the calls originate from cloud functions, I don't control the rate but can definitely tweak the settings.

Looking at the initial call in the screenshot, the deadline must be culprit as +3 min passed - I will extend but it would be good if the retry library could log when deadline is reached (and only that event, not every retry).

tswast commented 2 years ago

We recently added retry support for RESOURCE_EXHAUSTED in google-cloud-bigquery-storage. https://github.com/googleapis/python-bigquery-storage/pull/366

It's more subtle than other retries we've done so far, as the response message can contain a suggested sleep time until the next request (at least in the case of BigQuery, not sure if that's a general Google API thing)