[Enhancement] Stretch Kafka cluster over multiple Kubernetes clusters

jrivers96 commented 4 years ago

Hi,

We'd like to run strimzi multi-region in a stretch configuration because of the simplification to clients and failover benefits.

Is this in the roadmap or feasible?

scholzj commented 4 years ago

I assume with stretch cluster you mean stretched across different Kubernetes clusters running in the different regions? Correct?

jrivers96 commented 4 years ago

Correct. Our data is in sequence and it is stateful to many end users. It would be nice to use all of the community adapters without having to make a custom one for multiregion HA.

Any idea how far away the repo might be? We could help out potentially.

scholzj commented 4 years ago

It is something we were thinking about but there is no work going on or any apecific plans at this point. It would be a lot of work since you would need to redesign everything from scratch but also the value is a bit questionable since it would still work only on closely collocated clusters only because of latency etc.

scholzj commented 4 years ago

As this is vaguely in our long term plans, I changed this to enhancement. But for the record, you should not expect this to happen any time soon.

itaydj commented 3 years ago

Hi, Just wanted to mention that we also like to run Strimzi in a streched cluster setup across multiple k8s clusters

scholzj commented 2 years ago

Triaged on 31.3.2022: Planned to be done in the future with StrimziPodSets & Co. Should be kept open.

erszcz commented 2 years ago

We would appreciate this feature a lot, too!

jrivers96 commented 2 years ago

@scholzj Is there anything that we can look at related to the triage?

scholzj commented 2 years ago

There was not much discussion about it during the triage apart from saying that this is something we would want to have in the future. Just to be clear, we are triaging all the issues to clean them up. So it does not mean it will be available next month.

I talked a bit more about the importance of StrimziPodSets for this in this YouTube video: https://youtu.be/iSwrn1Gumx4 ... right now, we need to finish the PodSets first. Anyone interested in this feature can help by testing the StrimziPodSets in Strimzi 0.29, as that will be the basic building block.

It would be also great if everyone thinking about this feature had some thought about how they would expect such a stretch cluster to be linked together. There are many options and we definitely will not be able to support all of them (at least not initially for sure):

The standard Kubernetes primitives used already for listeners today (Load balancers, NodePorts, Ingress, OpenShift Routes)
Submariner
Skupper
Network Service Mesh
Regular service mesh (e.g. using Istio federation)
Something else?

It would be great to understand what would everyone prefer but also why. It would be also equally interesting why people might see some of these options as bad. That should allow us to better consider the pros and cons and decide about it in the future.

jrivers96 commented 2 years ago

The standard primitives might be best to get started with easily, but I wonder about the observability required to run this at scale in production. Istio might be more interesting then, but at the cost of complexity.

I'm also thinking about the failure modes related to a stretch cluster. Is it possible to keep some topics local to a cluster and some that are global? Do you want your offset partitions globally replicated?

I haven't seen any blogs about stretch clusters....

MR-GOYAL commented 2 years ago

Hi, We are also want to run Strimzi Kafka in stretch cluster . Is it in progress or still in future plan.

karolcienkosz commented 2 years ago

Hi, From my perspective this feature wold be really useful. I am administrating several kafka clusters (VMs in GCP) which are stretched over multiple regions and it works quite good and stable if latency is relatively small(eg. europe-west1 and europe-west4). The only blocker agains migration my clusters to strimzi is lack of multiple k8s clusters support. Do we have any update about it?

mustafaabasaran commented 1 year ago

Hi, We are currently using on production but multi dc, stretch configuration has become mandatory for our business. Is there any news about it?

fl0wx commented 1 year ago

+1

pnorth1 commented 1 year ago

Could this be accomplished via MirrorMaker2?

scholzj commented 1 year ago

@pnorth1 My view ... Mirror Maker mirrors data between two separate clusters. It has some advantages and disadvantages compared to stretched clusters:

It is asynchronous so it cannot guarantee that all messages acknowledged to the producer are mirrored
You need to have multiple Kafka clusters so it is more expensive to run
Migrating the clients between the clusters is not completely straight forward and you need to plan with it in your architecture
Unlike stretched cluster, Mirror Maker has only minimal limitations with regard to latency. The stretch cluster is great on paper - but it will need low latency between the Kubernetes clusters so IMHO the use-cases are fairly limited because in many cases, when you have such low latency you can stretch the Kubernetes cluster in the first place and make things even much easier.
It is supported by Strimzi already today ;-)

So there is some overlap but there are also some differences.

hiroarabay commented 1 year ago

+1

hiroarabay commented 1 year ago

@scholzj

It is supported by Strimzi already today ;-)

how can we create the stretched cluster of multi region in strimzi ?

scholzj commented 1 year ago

Assuming with multi-region you mean multiple Kubernetes clusters, then you cannot. That is the point of this issue. If your Kubernetes cluster is stretched across multiple regions, you can do it in the same way as when running on any other Kubernetes cluster.

Also, please keep in mind that Kafka is latency sensitive, so while there are use cases for stretching across multiple Kubernetes clusters, it will not necessarily work over long distances because of Kafka limitations.

hiroarabay commented 1 year ago

@scholzj Has anyone actually created that Strimzi cluster? Do you know actual test results and documentation?

scholzj commented 1 year ago

@hiroarabay This is an open issue because it is not supported. So no, there is no documentation and you cannot create a Strimzi cluster stretched over multiple Kubernetes clusters.

jcarcenegui commented 1 year ago

Any updates on this? It would be fantastic to have it running across multiple Kubernetes clusters.

audomsak commented 5 months ago

Hi, I saw many reps mentioned to the latency. The questions are:

how low latency is required for the stretch cluster?
What is the maximum latency the stretch cluster can tolerate?
Is this latency among Kafka brokers, Zookeeper nodes, or between Kafka brokers and Zookeeper nodes?

sionsmith commented 3 months ago

We have implemented Helm charts which leverage plain docker to work around this. It would be great and dramatically lower complexity if Strimzi did this out of the box!! 👍

neeraj-laad commented 3 months ago

I would like to help with starting a proposal for this issue but wanted to check if someone has already made a start on it? I would also be keen to hear if someone in the community has ideas/thoughts on how they might want to do this with some of the newer features like KafkaNodePools etc.

scholzj commented 3 months ago

@neeraj-laad The idea with node pools is that the node pool will be the unit where you configure the Kubernetes cluster. I.e. you will have for example node pool A configured to run on Kube cluster X, node pool B configured to run on Kube cluster Y etc.

neeraj-laad commented 3 months ago

@scholzj Are you thinking that the KafkaNodePool resources will be created on different Kubernetes clusters, but tied to a single Kafka resource or are were you implying that all the resources will live in one place but the deployment target for them will be different Kubernetes clusters? Were you thinking of a single cluster operator doing all of this or are we talkign multiple operators on each cluster?

I think it will be very useful to have an initial proposal to discuss and agree this kind of high-level topology for operators and various custom resources as a starting point, and follow up proposals/updates to same proposal for defining detailed implementation aspects. What do you think?

scholzj commented 3 months ago

No, the resources should be all in the same cluster as the Kafka resource. You need to orchestrate things from a single place. But the other clusters might have their own StrimziPodSet resources with their own operator instances managing them.

neeraj-laad commented 3 months ago

Thanks for the clarification. So the resources defining the cluster will all be in one Kubernetes and can be managed from there, but the deployments will happen in different clusters using help from other operators/reconcilers running in each clusters.

In the instance of the main Kubernetes cluster going down, the brokers/controllers on other Kubernetes will continue to run and be managed but the user will loose ability to make changes to Kafka/NodePool resources until the main cluster comes back up again.

Also do we have instances already within Strimzi (or otherwise) where the operator is creating/reading objects on a remote Kubernetes cluster?

scholzj commented 3 months ago

Thanks for the clarification. So the resources defining the cluster will all be in one Kubernetes and can be managed from there, but the deployments will happen in different clusters using help from other operators/reconcilers running in each clusters.

In the instance of the main Kubernetes cluster going down, the brokers/controllers on other Kubernetes will continue to run and be managed but the user will loose ability to make changes to Kafka/NodePool resources until the main cluster comes back up again.

Yes, that was roughly the idea. The only part running in the different Kubernetes clusters will be basically the StrimziPodSet controller. But just to be clear, this is something what I was planning while designing the StrimziPodSet and KafkaNodePool features. But while the proposals for those features mention it to some extent, this design has to be clarified and done as part of the stretch cluster proposal.

Also do we have instances already within Strimzi (or otherwise) where the operator is creating/reading objects on a remote Kubernetes cluster?

No, this is something you would need to deal with in the proposal and its implementation.

strimzi / strimzi-kafka-operator

[Enhancement] Stretch Kafka cluster over multiple Kubernetes clusters #3697