Pods does not recover from failure: abandoned subscription: was taking too long

practo / tipoca-stream

Near real time cloud native data pipeline in AWS (CDC+Sink). Hosts code for RedshiftSink. RDS to RedshiftSink Pipeline with masking and reloading support.

https://towardsdatascience.com/open-sourcing-tipoca-stream-f261cdcc3a13

Apache License 2.0

47 stars 5 forks source link

Pods does not recover from failure: abandoned subscription: was taking too long #252

Open alok87 opened 2 years ago

alok87 commented 2 years ago

Batcher needs to be manually restarted when the following errors happen. When Kafka faces downtime, it is seen some batcher get stuck with the following errors. They need a restart from recovering from this.

[sarama] 2021/09/20 06:41:38 consumer/broker/2 abandoned subscription to ts.db.table/0 because consuming was taking too long

They should recover without restarts, or fatal and restart on its own.

alok87 commented 2 years ago

The first short term fix should be to make the error Fatal so that pod restarts and do not stay like this. This is very much required in Main Sink Group where multiple topics are loaded together in one pod to save connection to Redshift.

Happens for the loader as well.