The KCL is sadly not very good at crashing and exiting. If the underlying Kinesis client has errors (e.g. permissions errors) then KCL tends to stay alive and not propagate the exceptions to our application code. We want the app to crash under these circumstances because that triggers an alert.
common-streams already has a health check feature, in which a health probe becomes unhealthy if a single event gets stuck without making progress.
This PR leans on the existing health check feature, so it also becomes unhealthy if the Kinesis client is not regularly receiving healthy responses.
I configured KCL to invoke our record processor every time it polls for records, even if the batch is empty. This means the health check still works even if there are no events in the stream.
Part of PDP-1196
The KCL is sadly not very good at crashing and exiting. If the underlying Kinesis client has errors (e.g. permissions errors) then KCL tends to stay alive and not propagate the exceptions to our application code. We want the app to crash under these circumstances because that triggers an alert.
common-streams already has a health check feature, in which a health probe becomes unhealthy if a single event gets stuck without making progress.
This PR leans on the existing health check feature, so it also becomes unhealthy if the Kinesis client is not regularly receiving healthy responses.
I configured KCL to invoke our record processor every time it polls for records, even if the batch is empty. This means the health check still works even if there are no events in the stream.