This PR amends the kinesis target to handle retries of throttles directly, separately from the retries in the main flow of the app.
Throttle errors are retried with a 100ms backoff (+ jitter of a random number of microseconds up to 30 milliseconds). The backoff increases by 100ms up to 1s.
Successes are not retired, but are acked straight away and handled as success in the main flow.
Errors other than throttles are handled in the same flow as before.
The function will not return until a request results in no throttle errors.
For metrics purposes, timestamps are recorded straight after a request attempt - so in all cases only the last request timestamp is kept. In the case of throttles, this is the eventual successful attempt. For other failures and successes, there is only one attempt and that is what is recorded, as before.
A note on testing:
Manually configured localstack to return throttle errors 50% of the time - this is global for kinesis but it means existing tests serve to cover both throttled and non-throttled cases. I left the config option in the docker compose file, but set it to 0.0, because it breaks kinesis source tests.
All attempts to make localstack or a live kinesis stream produce a non-throttle 'internal error' were unsuccessful - they all made the whole request fail
Thus, in review particular attention to the correctness of handling non-throttle errors would be appreciated.
Manual tests have resulted in the desired result - when we encounter throttling we no longer crash the app in handling it, and throughput is greatly improved.
This PR amends the kinesis target to handle retries of throttles directly, separately from the retries in the main flow of the app.
Throttle errors are retried with a 100ms backoff (+ jitter of a random number of microseconds up to 30 milliseconds). The backoff increases by 100ms up to 1s.
Successes are not retired, but are acked straight away and handled as success in the main flow.
Errors other than throttles are handled in the same flow as before.
The function will not return until a request results in no throttle errors.
For metrics purposes, timestamps are recorded straight after a request attempt - so in all cases only the last request timestamp is kept. In the case of throttles, this is the eventual successful attempt. For other failures and successes, there is only one attempt and that is what is recorded, as before.
A note on testing:
Thus, in review particular attention to the correctness of handling non-throttle errors would be appreciated.
Manual tests have resulted in the desired result - when we encounter throttling we no longer crash the app in handling it, and throughput is greatly improved.