snowplow-incubator / common-streams

Other
1 stars 0 forks source link

Kinesis source must report its latency when KCL is not healthy #77

Closed istreeter closed 3 months ago

istreeter commented 3 months ago

Part of PDP-1196

The KCL is sadly not very good at crashing and exiting. If the underlying Kinesis client has errors (e.g. permissions errors) then KCL tends to stay alive and not propagate the exceptions to our application code. We want the app to crash under these circumstances because that triggers an alert.

common-streams already has a health check feature, in which a health probe becomes unhealthy if a single event gets stuck without making progress.

This PR leans on the existing health check feature, so it also becomes unhealthy if the Kinesis client is not regularly receiving healthy responses.

I configured KCL to invoke our record processor every time it polls for records, even if the batch is empty. This means the health check still works even if there are no events in the stream.