googleapis / python-api-core

https://googleapis.dev/python/google-api-core/latest
Apache License 2.0
112 stars 81 forks source link

Major refactoring of Timeout logic introduced incompatibility about RPC Timeout #654

Open yoshizow opened 1 month ago

yoshizow commented 1 month ago

Hello,

I'm using the python-pubsub library and noticed that the timeout behavior is strange. After some investigation, I believe it is due to an incompatibility introduced by https://github.com/googleapis/python-api-core/pull/462.

The default retry and timeout settings in the python-pubsub library are as follows: https://github.com/googleapis/python-pubsub/blob/v2.21.1/google/pubsub_v1/services/publisher/transports/base.py#L166-L181

default_retry=retries.Retry(
    initial=0.1,
    maximum=60.0,
    multiplier=4,
    predicate=retries.if_exception_type(
        core_exceptions.Aborted,
        core_exceptions.Cancelled,
        core_exceptions.DeadlineExceeded,
        core_exceptions.InternalServerError,
        core_exceptions.ResourceExhausted,
        core_exceptions.ServiceUnavailable,
        core_exceptions.Unknown,
    ),
    deadline=600.0,
),
default_timeout=60.0,

Before #462 was introduced, default_timeout=60.0 was interpreted as default_timeout=ConstantTimeout(60). This should be intended to set the RPC Timeout, i.e., the timeout for a single RPC call not including retries, to 60 seconds. However, after the introduction of #462, this line is now interpreted as default_timeout=TimeToDeadlineTimeout(60). Since TimeToDeadlineTimeout determines the total timeout time across retries rather than determining the RPC Timeout, I don't believe that it can be used as a direct replacement for ConstantTimeout.

The strange behavior I mentioned at the beginning came from this: when the server did not respond in the first 60 seconds, the remaining timeout for RPC call was 0 seconds, and the gRPC client returned a 504 Deadline Exceeded on every retry.

TimeToDeadlineTimeout by itself is useful, but given the original intent, I think it is necessary to specify the RPC Timeout as well as the total timeout.

Finally, in my project, I implemented the following ClampMaxTimeout so that the RPC Timeout can be set while respecting the behavior of TimeToDeadlineTimeout.

RPC_TIMEOUT_SECS   = 60.0
TOTAL_TIMEOUT_SECS = 600.0
...
publisher_options=pubsub_v1.types.PublisherOptions(
    timeout=compose(
        api_core.timeout.TimeToDeadlineTimeout(TOTAL_TIMEOUT_SECS),
        ClampMaxTimeout(RPC_TIMEOUT_SECS),
    ),
    ...
...
class ClampMaxTimeout(object):
    def __init__(self, max_timeout: float):
        self._max_timeout = max_timeout

    def __call__(self, func: Callable) -> Callable:
        @functools.wraps(func)
        def func_with_max_timeout(*args, **kwargs):
            if kwargs.get("timeout") is not None:
                kwargs["timeout"] = min(kwargs["timeout"], self._max_timeout)
            else:
                kwargs["timeout"] = self._max_timeout
            return func(*args, **kwargs)

        return func_with_max_timeout

    def __str__(self) -> str:
        return "<ClampMaxTimeout max_timeout={:.1f}>".format(self._max_timeout)

def compose(f: Callable, g: Callable) -> Callable:
    return lambda x: f(g(x))

To keep backward compatibility, I believe it is desirable to introduce a similar process in python-api-core.

vchudnov-g commented 1 month ago

Thanks for reporting this issue! I had a quick look and this appears to be an instance where we should at least be clearer, and possibly implement the change we suggest. We'll look into this.


Notes as we investigate: