harlow / kinesis-consumer

Golang library for consuming Kinesis stream data
MIT License
263 stars 88 forks source link

Add the ability to handle more errors without closing the consumer #131

Closed jonandrewj closed 3 years ago

jonandrewj commented 3 years ago

I've been looking at some of our error logs when using this library and noticed a good number of what just appears to be maybe flakiness in the aws kinesis library or just unhandled networking errors. err.Error() == "shard shardId-000000000822 error: get records error: send request failed"

As a short-term solution, we ended up just wrapping the consumer initialization in a for loop and restarting the consumer when it gets these occasional errors (~50x per week for one of our services), but I was thinking it would be nice to potentially handle these errors at the shard level so that the whole consumer doesn't need to be cancelled/restarted.

Also related, a couple months ago, AWS had the kinesis outage in the US-EAST-1 region and we experienced roughly a 50% error rate with any requests to kinesis. If we were able to handle these errors at the shard level without impacting other shards, it would have given our service a chance to cope with the 50% error rate and still process the kinesis stream. Unfortunately, 50% error rate across several shards caused the consumer to stop and restart too frequently and we were stuck waiting until the error rate improved in Kinesis.

So I guess I have two questions:

  1. Does this "send request failed" error look like something we can just add as an always retry-able error (I've seen that there is a small collection of those)?
  2. Or, how receptive would you be to there being an optional error handler (isRetryable() method) that the caller can provide for handling the shard reader errors in a custom way? (I'd be happy to send a PR)
jonandrewj commented 3 years ago

Added https://github.com/harlow/kinesis-consumer/pull/132 to address option 1.