Open relistan opened 4 years ago
Can you please export the dynamodb table and send it to me?
If you restart benthos does it fix the problem or do the shards remain locked but not processing?
@patrobinson Sorry, I can't export the table since this is for real and we had to get it back up and running. Yes, restarting it works. This has happened a few times, so it's not a one-off. The only remaining columns were the sequence IDs if I recall properly, and I also screenshotted those at the time:
Hi @relistan
Have you observed any errors in the logs such as Error renewing lease
?
I've got some time now I can try and replicate this myself
reading through the code I think this could happen if GetRecords returns an unrecoverable error.
https://github.com/patrobinson/gokini/blob/master/consumer.go#L364-L376
At which point I think we should panic rather than leave ourselves in a bad state.
I've released a beta version with this fix https://github.com/patrobinson/gokini/releases/tag/v0.2.0-beta and I'll give it a whirl later this week
@patrobinson we stopped running it shortly after I opened this. We switched to the other Benthos Kinesis consumer and hard-pinned worker to shards. That's less than ideal but worked. So, unfortunately, I have no better info for you.
We're running Gokini with the Benthos integration you wrote (
kinesis_balanced
). Most of the time everything works fine. After awhile, however, we keep ending up with shards that are not being consumed. It appears to be something in theGetLease()
code. We're still trying to track down the issue, but I wanted to get something on the project radar.At first we thought it was a problem of more than one consumer process stepping on each other's toes, but it happens even with only a single consumer.
The symptom is shards in DynamoDB marked as
Closed: false
but where theLeaseTimeout
is quite awhile in the past. See attached screenshot.