snowplow / snowbridge

For replicating streams across clouds, accounts and regions
Other
14 stars 8 forks source link

Kinesis throttle retry #363

Closed colmsnowplow closed 1 week ago

colmsnowplow commented 1 week ago

This PR amends the kinesis target to handle retries of throttles directly, separately from the retries in the main flow of the app.

Throttle errors are retried with a 100ms backoff (+ jitter of a random number of microseconds up to 30 milliseconds). The backoff increases by 100ms up to 1s.

Successes are not retired, but are acked straight away and handled as success in the main flow.

Errors other than throttles are handled in the same flow as before.

The function will not return until a request results in no throttle errors.

For metrics purposes, timestamps are recorded straight after a request attempt - so in all cases only the last request timestamp is kept. In the case of throttles, this is the eventual successful attempt. For other failures and successes, there is only one attempt and that is what is recorded, as before.

A note on testing:

Thus, in review particular attention to the correctness of handling non-throttle errors would be appreciated.

Manual tests have resulted in the desired result - when we encounter throttling we no longer crash the app in handling it, and throughput is greatly improved.