Closed ruslan-maiboroda closed 1 year ago
I wonder how does your architecture look like with 40k topics. If you want, you can try to tune the healtchecks or give it more resources. But I think the Topic Operator was not designed for this kind of scale. It is also being replaced to make it compatible with ZooKeeper-less Kafka (see https://github.com/strimzi/proposals/blob/main/051-unidirectional-topic-operator.md for more details). So I do not think any improvements to the scalability of the old version are planned.
I'm not convinced that health checks will be effective since there are constantly exceptions recorded in the logs. Additionally, the resources being utilized are extremely low.
That is fine - I'm not convinced they would really solve it either. But it is probably the only thing you can try easily. As I said - I think 40k topics are out of scale for the Topic Operator. TBH, I would probably not want to have 40k KafkaTopic
resources in the Kubernetes cluster itself as that might cause a lot of issues even in Kubernetes alone.
Triaged on community call 18/5/2023: The bidirectional topic operator was not designed to scale to 40,000 topics, and with the proposal for the unidirectional topic operator now accepted it doesn't really make sense to try to improve the old one. The new topic operator has been written with the needs to scale to a larger number of topics in mind, but 40,000 seems ambitious even in that case, not least the effect of having that many resources in Kube (irrespective of the operator accessing them) needs to be understood. Marking as won't fix, at least for the bidirectional topic operator case.
@tombentley Do we have an ETA on which release the unidirectional topic operator will be available ?
The Unidirectional Topic Operator is available from 0.36.0. It is behind a feature gate which is disabled by default. So you would need to enable it (also, you should be aware that things might change or not work when behind an alpha feature gate). The current plan for it to be enabled by default is in Strimzi 0.39
Thanks @scholzj for the update this helps us in running strmizi kafka more confidently in production. Just want to check do you see any expected impact on the strimzi kafka if we run close to 40K topics on our kafka cluster. Our use case needs a new topic per requests.
I have no idea.
But I think using topic-per-request might be a bad pattern. It sounds like you need something like ActiveMQ for example rather than Kafka.
40K topics in kubernetes without using topic Operator
Not sure I follow that. The KafkaTopic resources have no meaning without the Topic Operator. Plus I guess 40k resources might be a bit of a challenge for a regular Kube cluster as well.
@scholzj with the unidirectionatopic operator released ,
Bug Description
I have ~40,000 kafka topics
After rebooting the entity-operator, it failed the readiness probe, and the issue seems to be with the topic-operator pod.
I have discovered that increasing the maxbuffer might resolve the issue. However, it's worth noting that according to the Zookeper documentation, they do not recommend exceeding the default value for this property due to certain reasons. Therefore, there might be an alternative approach to fix the problem.
Steps to reproduce
No response
Expected behavior
Readiness probe should pass without error
Strimzi version
0.33.0
Kubernetes version
1.24
Installation method
Helm chart
Infrastructure
Amazon EKS
Configuration files and logs
Here are the logs for the entity-operator pod:
tls-sidecar.txt user-operator.txt topic-operator.txt
Additional context
No response