Klaviyo API hangs once every 10-20 requests

controldev commented 1 year ago

Hi there,

I noticed that, starting about 1-2 weeks ago, about one every 10-20 requests (roughly 5-10% of calls) made using the Klaviyo API for Python hang and cause my containing Django app to timeout.

I'm using klaviyo-api==2.0.0 inside a Django 3.2.11 app, which runs on a python:3.8-slim-buster with no further modifications made.

This is an issue exclusive to calls made to Klaviyo and I've excluded any connectivity issues on my side.

Any ideas, fixes or workarounds would be highly appreciated. Let me know if there is any other information I can provide.

Below is a stack trace for calls made to Profiles.create_profile(), the program is stuck on return self._sslobj.read(len, buffer) until the gunicorn timeout is hit:

    profile_id = klaviyo_client.Profiles.create_profile(body)
  File "klaviyo_api/wrapper.py", line 330, in _wrapped_func
    return func(*args,**kwargs)
  File "__init__.py", line 324, in wrapped_f
    return self(f, *args, **kw)
  File "__init__.py", line 404, in __call__
    do = self.iter(retry_state=retry_state)
  File "__init__.py", line 349, in iter
    return fut.result()
  File "concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "__init__.py", line 407, in __call__
    result = fn(*args, **kwargs)
  File "openapi_client/api/profiles_api.py", line 1013, in create_profile
    return self.create_profile_endpoint.call_with_http_info(**kwargs)
  File "openapi_client/api_client.py", line 893, in call_with_http_info
    return self.api_client.call_api(
  File "openapi_client/api_client.py", line 422, in call_api
    return self.__call_api(resource_path, method,
  File "openapi_client/api_client.py", line 199, in __call_api
    response_data = self.request(
  File "openapi_client/api_client.py", line 468, in request
    return self.rest_client.POST(url,
  File "openapi_client/rest.py", line 271, in POST
    return self.request("POST", url,
  File "openapi_client/rest.py", line 157, in request
    r = self.pool_manager.request(
  File "urllib3/request.py", line 79, in request
    return self.request_encode_body(
  File "urllib3/request.py", line 171, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
  File "urllib3/poolmanager.py", line 336, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "urllib3/connectionpool.py", line 426, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
    # Permission is hereby granted, free of charge, to any person obtaining a copy
  File "urllib3/connectionpool.py", line 421, in _make_request
    httplib_response = conn.getresponse()
  File "http/client.py", line 1347, in getresponse
    response.begin()
  File "http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
  File "gunicorn/workers/base.py", line 201, in handle_abort
    sys.exit(1)

jon-batscha commented 1 year ago

Hey,

Thanks for reaching out, and apologies for the late response.

I'm wondering if this is due to our built-in retries.

Can you try altering the retry settings; that may lower the wait time that causes the gunicorn timeout.

For reference, the retry settings (with default values) at the client level are:

klaviyo = KlaviyoAPI("YOUR_API_KEY_HERE", max_delay=60, max_retries=3)

where max_delay is the total delay across all retry attempts.

If you'd like to turn off retries altogether and implement your own error handling, you can set:

klaviyo = KlaviyoAPI("YOUR_API_KEY_HERE", max_delay=0, max_retries=0)

odedva commented 1 year ago

hey @jon-batscha - qq regarding retries mechanism - on what status codes does the library makes retries? we occasionally get 502 responses (like once a day, usually multiple of those at the same time) from the klaviyo api, and not sure if the client library makes a retry on those. seems to us that it does not and wanted to double check that with you... also, we would like to use the _request_timeout variable to use that and i'm not sure how does that combines with the max_delay and max_retry when setting the API class.

last time we saw this was around 23:00 UTC time Aug 28th.. example of error response we see in our logs (seems also something with cloudflare?)

jon-batscha commented 11 months ago

Hi Oded,

Thanks for reaching out, great question!

I actually just updated the retry logic in today’s release (thanks to your feedback), so what I’ll describe below applies to versions 5.3.0 and later:  we retry on the following error codes:

    _STATUS_CODE_CONNECTION_RESET_BY_PEER = 104
    _STATUS_CODE_TOO_MANY_REQUESTS = 429
    _STATUS_CODE_SERVICE_UNAVAILABLE = 503
    _STATUS_CODE_GATEWAY_TIMEOUT = 504
    _STATUS_CODE_A_TIMEOUT_OCCURED = 524

NOTE: we do not retry on 502 errors, as those are not guaranteed to be transient, and so retrying on those could hold up jobs. That said, our API team is aware of the occasional 502s and is working to fix this on our end (at the API-level).

In terms of the retry logic: we currently take the following 2 params:

max_delay
max_retries

We retry up to max_retries, using the following algorithm: wait_random_exponential

The algorithm starts with a 1 second wait, and at each retry, doubles the wait time, up to max_delay

Given this updated logic, we now recommend the following default values (now reflected in our defaults/readme:

max_delay = 60
max_retries = 7

With this setting, the SDK will behave as follows: 

First request, no delay
Second request, wait 1 second before retrying
… wait 2 seconds before retrying
… wait 4 seconds …
… 8 seconds …
… 16 seconds …
… 32 seconds …

At this point, if no success, we’ll stop retrying.

You can find the code that implements retry logic and sets retry codes here.

Hopefully this answers your question. Please do not hesitate to reach out if you run into any further issues.

(I also received your email yesterday, will follow up on the questions you sent over there in a bit)

sanfordj commented 5 months ago

closing this issue as it has been inactive for ~ 6 months

klaviyo / klaviyo-api-python

Klaviyo API hangs once every 10-20 requests #28