Closed ajborley closed 4 years ago
Some thoughts to the proposed CR/CRD:
version
and replicas
fields directly under the .spec
to keep things more in sync with other CRDs. In general I would maybe not use the connect:
path at all and keep these things on the higher level.@scholzj - thanks for the comments :)
I think we should have the version and replicas fields directly under the .spec to keep things more in sync with other CRDs. In general I would maybe not use the connect: path at all and keep these things on the higher level.
OK, fair enough. The idea was to keep all the connect configuration separate from the mirrors and clusters sections, happy to rearrange this though.
The configuration seems to be quite detailed (e.g. the task numbers etc.). I assume these are kind of expert options which will have some suitable defaults? If yes, could you also share an example of a minimal resource ... i.e. one without any detailed configuration which is not mandatory / has defaults?
Sure. The following creates a Connect cluster connected to the "prague-cluster"
kafka and a MirrorSourceConnector to mirror a single topic from "london-cluster"
to "prague-cluster"
:
apiVersion: kafka.strimzi.io/v1alpha1
kind: KafkaMirrorMaker2
metadata:
name: my-mm2
spec:
clusters:
- alias: "prague-cluster"
bootstrapServers: "12.34.56.78:9092"
- alias: "london-cluster"
bootstrapServers: "98.76.54.32:9092"
connectCluster: "prague-cluster"
mirrors:
- sourceCluster: "london-cluster"
targetCluster: "prague-cluster"
sourceConnector: {}
topics: "mytopic"
Note here that the single item in the mirrors
list is optional - it would also be valid to start up a MirrorMaker2 connect cluster with no connectors running. Also note that topics
is optional and defaults to an empty string, meaning that no topics would be mirrored by the MirrorSourceConnector.
Will the users easily understand the things such as Do I need to enable the hearthbeatConnector or checkpointConnector?
The 3 connectors (MirrorSourceConnector, MirrorHeartbeatConnector and MirrorCheckpointConnector) in MirrorMaker 2.0 have specific behaviours that are explained in the Kafka 2.4 docs and are useful in different mirroring scenarios. MirrorSourceConnector mirrors the topic records and topic configuration. MirrorCheckpointConnector creates a map of source/target consumer group offsets so that consumers can failover to the target cluster. MirrorHeartbeatConnector regularly sends a heartbeat record to a target cluster topic and can be useful for monitoring and failover when there are chains of clusters that are mirroring to each other.
@ajborley Thanks fro the explanation
@tombentley @ppatierno Any comments from your side?
I'm +1 on the overall idea.
I would like to see what the status of the proposed CR would look like.
Using a Secret reference which points to a pkcs12 truststore, rather than individual certificates would make the YAML less verbose in the case where multiple certs needed to be trusted.
I'm also wondering exactly how spec.clusters.tls.trustedCertificates
would work wrt certificate renewal. I don't want to complexify things, but it would be nice if the operator noticed when that secret changed and reconfigured the connect cluster to trust those added certs (and not trust removed certs too, of course). I'm not 100% certain whether the machinery we already have would do that.
What if a Kafka cluster uses SCRAM SHA authentication?
I know the KafkaConnector CRD doesn't support it yet, but how would we pause a connector? Presumably something like
- sourceCluster: "paris-cluster"
targetCluster: "prague-cluster"
sourceConnector:
paused: true
tasksMax: 2
config:
replication.factor: 1
offset-syncs.topic.replication.factor: 1
sync.topic.acls.enabled: "false"
?
I think the TLS and Authentication sections should mirror what we have already in other resources. The renewals are a good point, but again we should do tham everywhere and keep it separate from this effort.
This has ben done and released in 0.17.0.
Is your feature request related to a problem? Please describe.
The Kafka 2.4 release will introduce a new version of MirrorMaker (under KIP 382 https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0) which includes multiple improvements on MirrorMaker 1.0, but is also very different. It would be great to have a strimzi CRD and operator that can deploy and manage a MirrorMaker 2.0 solution.
Describe the solution you'd like
A new strimzi CRD and operator for MirrorMaker 2.0 that will deploy a dedicated Kafka Connect cluster and run a defined set of MirrorMaker 2.0 connectors. This could be achievable by combining code from the current strimzi Connect and Connector operators into a single operator that can only deploy the MirrorMaker 2.0 connectors.
An example CR:
clusters
section defines a list of clusters with aliases and their connection details.connect
section defines the connect cluster, using an extension of theio.strimzi.api.kafka.model.KafkaConnectSpec
spec which allows the connection details to be provided by referring to one of the cluster aliases defined in theclusters
secction.mirrors
list contains one or moresourceConnector
,checkpointConnector
andheartbeatConnector
sections to define the connectors that get added to connect. They use theio.strimzi.api.kafka.model.KafkaConnectorSpec
spec to configure each connector, with hardcoded connector class names (org.apache.kafka.connect.mirror.MirrorSourceConnector
, etc) and thesourceCluster
andtargetCluster
aliases resolved from theclusters
list.Describe alternatives you've considered
As MirrorMaker 2.0 is a set of connectors, strimzi could simply document how to use the existing connect and connector operators to set up MirrorMaker 2.0. However, this would not provide the ease-of-use of a MirrorMaker 2.0 operator.
An alternative strimzi MirrorMaker 2.0 CRD/operator could pre-req a Connect cluster and be used to deploy the MirrorMaker 2.0 connectors to that Connect cluster. This would be a simpler CR, as only the connector config is required, and would allow a Connect cluster to be used for both mirroring and any other connectors. However, it is thought unlikely that users would want to run a Connect cluster for multiple purposes, particularly as MirrorMaker 2.0 requires a high level of privileges for admin operations and topic writes.
MirrorMaker 2.0 can also be run as a dedicated cluster without the requirement for a Connect cluster (https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382:MirrorMaker2.0-RunningadedicatedMirrorMakercluster). The strimzi MirrorMaker 2.0 CRD and operator could deploy a dedicated cluster as an alternative to creating a Connect cluster. The advantages of a dedicated cluster:
Disadvantages:
Additional context
KIP-382 includes enhancements that are not yet in the Kafka codebase, such as a MirrorSinkConnector, a CheckpointSinkConnector and updates to Connect that obviate the need for the MirrorMaker 2.0 dedicated cluster. It is likely these enhancements will be in future Kafka releases.
KIP-382 also describes a 'legacy' mode which will run MirrorMaker 2.0 with a configuration that matches the behaviours of MirrorMaker 1.0. This has not yet been delivered into Kafka. When that is delivered into Kafka and MirrorMaker 1.0 is deprecated, the current strimzi MirrorMaker CRD/operator can either be converted to use MirrorMaker 2.0 legacy mode, can be deprecated itself, or can be merged with the MirrorMaker 2.0 CRD/operator, so that strimzi has just one MirrorMaker CRD/operator.