Closed erezblm closed 11 months ago
The operator log seems to not contain any rolling of the Kafka Connect pods ... if you look for KafkaConnectRoller
logs, it always seems to be saying the pods do not need to be rolled.
The Connect logs seem to suggest the pods were stopped. But since their logs do not seem overlap with the operator log n terms of the time, it is hard to say if it was the operator or not. It could have been also something else. If it is really the operator doing this, you would need to provide some logs where it overlaps and covers the situation from both ends. You can also check the Kubernetes events that might suggest something else stopped the Connect pods.
Thanks, i’ll try and add the overlapping logs tomorrow.. I don’t think it’s rollout because the pods are terminated separately and not immediately one after the other. I thought it might be related to ‘enableRestart’, but i couldn’t find any errors, and I would expect it to restart just the tasks and not the whole pod.
What do you mean with enableRestart
?
Discussed on the community call on 30.11.2023: Can you please clarify what exactly you meant with the enableRestart
reference? Otherwise, there does not seem to be much more we can do about this based on the information we have and we will close it.
Discussed in the community call on 14.12.: No more information received since last time. We are going to close it. Feel free to reopen it or start a discussion if you can provide more details.
Bug Description
Hi,
My 2 Connect Cluster pods keeps getting restarted every 10-20 minutes.
After I managed somehow store logs locally, I was able tot see the logs of the previous terminated pods, but it doesn't seem that they had any errors - so I assume that the operator just keep restarting them for some reason.
I'm running Connect cluster with 2 replicas and multiple connectors of 2 kinds (MQTT source and Clickhouse Sink), each with multiple tasks.
I would appreciate some help to figure out even how to debug it, because I couldn't understand exactly from the operator logs where the restart occured.
Attached the operator debug logs (connect logs didn't seem to have anything interesting but can upload as well)
Steps to reproduce
No response
Expected behavior
Connect pods keep running without being restarted
Strimzi version
0.37
Kubernetes version
1.26.6
Installation method
Terraform Provider
Infrastructure
No response
Configuration files and logs
Connect Spec + status(taken from edit):
One of the connector's example:
Operator debug logs (restart occurred around 10:40): operatorlogs.txt
Connect pods last logs before termination connect-1.log connect-0.log
Additional context
No response