strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.85k stars 1.3k forks source link

[Enhancement] MirrorMaker 2.0 operator #2285

Closed ajborley closed 4 years ago

ajborley commented 4 years ago

Is your feature request related to a problem? Please describe.

The Kafka 2.4 release will introduce a new version of MirrorMaker (under KIP 382 https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0) which includes multiple improvements on MirrorMaker 1.0, but is also very different. It would be great to have a strimzi CRD and operator that can deploy and manage a MirrorMaker 2.0 solution.

Describe the solution you'd like

A new strimzi CRD and operator for MirrorMaker 2.0 that will deploy a dedicated Kafka Connect cluster and run a defined set of MirrorMaker 2.0 connectors. This could be achievable by combining code from the current strimzi Connect and Connector operators into a single operator that can only deploy the MirrorMaker 2.0 connectors.

An example CR:

apiVersion: kafka.strimzi.io/v1alpha1
kind: KafkaMirrorMaker2
metadata:
  name: my-mm2
spec:
  clusters:
  - alias: "prague-cluster"
    bootstrapServers: "12.34.56.78:9092"
    tls:
      trustedCertificates:
      - secretName: prague-cluster-cluster-ca-cert
        certificate: ca-prague.crt
  - alias: "london-cluster"
    bootstrapServers: "98.76.54.32:9092"
    tls:
      trustedCertificates:
      - secretName: london-cluster-cluster-ca-cert
        certificate: ca-london.crt
  - alias: "paris-cluster"
    bootstrapServers: "99.88.77.66:9092"
    tls:
      trustedCertificates:
      - secretName: paris-cluster-cluster-ca-cert
        certificate: ca-paris.crt
  connect: 
    version: 2.4.0
    cluster: "prague-cluster"
    replicas: 3
  mirrors:
  - sourceCluster: "london-cluster"
    targetCluster: "prague-cluster"
    sourceConnector:
      tasksMax: 2
      config:
        replication.factor: 1
        offset-syncs.topic.replication.factor: 1
        sync.topic.acls.enabled: "false"    
    checkpointConnector:
      tasksMax: 2
    heartbeatConnector:
      tasksMax: 1
    topics: ".*"
    groups: ".*"
  - sourceCluster: "paris-cluster"
    targetCluster: "prague-cluster"
    sourceConnector:
      tasksMax: 2
      config:
        replication.factor: 1
        offset-syncs.topic.replication.factor: 1
        sync.topic.acls.enabled: "false"    
    checkpointConnector:
      tasksMax: 2    
    topics: "SALES"
    groups: ".*" 

Describe alternatives you've considered

As MirrorMaker 2.0 is a set of connectors, strimzi could simply document how to use the existing connect and connector operators to set up MirrorMaker 2.0. However, this would not provide the ease-of-use of a MirrorMaker 2.0 operator.

An alternative strimzi MirrorMaker 2.0 CRD/operator could pre-req a Connect cluster and be used to deploy the MirrorMaker 2.0 connectors to that Connect cluster. This would be a simpler CR, as only the connector config is required, and would allow a Connect cluster to be used for both mirroring and any other connectors. However, it is thought unlikely that users would want to run a Connect cluster for multiple purposes, particularly as MirrorMaker 2.0 requires a high level of privileges for admin operations and topic writes.

MirrorMaker 2.0 can also be run as a dedicated cluster without the requirement for a Connect cluster (https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382:MirrorMaker2.0-RunningadedicatedMirrorMakercluster). The strimzi MirrorMaker 2.0 CRD and operator could deploy a dedicated cluster as an alternative to creating a Connect cluster. The advantages of a dedicated cluster:

Disadvantages:

Additional context

KIP-382 includes enhancements that are not yet in the Kafka codebase, such as a MirrorSinkConnector, a CheckpointSinkConnector and updates to Connect that obviate the need for the MirrorMaker 2.0 dedicated cluster. It is likely these enhancements will be in future Kafka releases.

KIP-382 also describes a 'legacy' mode which will run MirrorMaker 2.0 with a configuration that matches the behaviours of MirrorMaker 1.0. This has not yet been delivered into Kafka. When that is delivered into Kafka and MirrorMaker 1.0 is deprecated, the current strimzi MirrorMaker CRD/operator can either be converted to use MirrorMaker 2.0 legacy mode, can be deprecated itself, or can be merged with the MirrorMaker 2.0 CRD/operator, so that strimzi has just one MirrorMaker CRD/operator.

scholzj commented 4 years ago

Some thoughts to the proposed CR/CRD:

ajborley commented 4 years ago

@scholzj - thanks for the comments :)

I think we should have the version and replicas fields directly under the .spec to keep things more in sync with other CRDs. In general I would maybe not use the connect: path at all and keep these things on the higher level.

OK, fair enough. The idea was to keep all the connect configuration separate from the mirrors and clusters sections, happy to rearrange this though.

The configuration seems to be quite detailed (e.g. the task numbers etc.). I assume these are kind of expert options which will have some suitable defaults? If yes, could you also share an example of a minimal resource ... i.e. one without any detailed configuration which is not mandatory / has defaults?

Sure. The following creates a Connect cluster connected to the "prague-cluster" kafka and a MirrorSourceConnector to mirror a single topic from "london-cluster" to "prague-cluster" :

apiVersion: kafka.strimzi.io/v1alpha1
kind: KafkaMirrorMaker2
metadata:
  name: my-mm2
spec:
  clusters:
  - alias: "prague-cluster"
    bootstrapServers: "12.34.56.78:9092"
  - alias: "london-cluster"
    bootstrapServers: "98.76.54.32:9092"
  connectCluster: "prague-cluster"
  mirrors:
  - sourceCluster: "london-cluster"
    targetCluster: "prague-cluster"
    sourceConnector: {}
    topics: "mytopic"

Note here that the single item in the mirrors list is optional - it would also be valid to start up a MirrorMaker2 connect cluster with no connectors running. Also note that topics is optional and defaults to an empty string, meaning that no topics would be mirrored by the MirrorSourceConnector.

Will the users easily understand the things such as Do I need to enable the hearthbeatConnector or checkpointConnector?

The 3 connectors (MirrorSourceConnector, MirrorHeartbeatConnector and MirrorCheckpointConnector) in MirrorMaker 2.0 have specific behaviours that are explained in the Kafka 2.4 docs and are useful in different mirroring scenarios. MirrorSourceConnector mirrors the topic records and topic configuration. MirrorCheckpointConnector creates a map of source/target consumer group offsets so that consumers can failover to the target cluster. MirrorHeartbeatConnector regularly sends a heartbeat record to a target cluster topic and can be useful for monitoring and failover when there are chains of clusters that are mirroring to each other.

scholzj commented 4 years ago

@ajborley Thanks fro the explanation

@tombentley @ppatierno Any comments from your side?

tombentley commented 4 years ago

I'm +1 on the overall idea.

I would like to see what the status of the proposed CR would look like.

Using a Secret reference which points to a pkcs12 truststore, rather than individual certificates would make the YAML less verbose in the case where multiple certs needed to be trusted.

I'm also wondering exactly how spec.clusters.tls.trustedCertificates would work wrt certificate renewal. I don't want to complexify things, but it would be nice if the operator noticed when that secret changed and reconfigured the connect cluster to trust those added certs (and not trust removed certs too, of course). I'm not 100% certain whether the machinery we already have would do that.

What if a Kafka cluster uses SCRAM SHA authentication?

I know the KafkaConnector CRD doesn't support it yet, but how would we pause a connector? Presumably something like

  - sourceCluster: "paris-cluster"
    targetCluster: "prague-cluster"
    sourceConnector:
      paused: true
      tasksMax: 2
      config:
        replication.factor: 1
        offset-syncs.topic.replication.factor: 1
        sync.topic.acls.enabled: "false" 

?

scholzj commented 4 years ago

I think the TLS and Authentication sections should mirror what we have already in other resources. The renewals are a good point, but again we should do tham everywhere and keep it separate from this effort.

scholzj commented 4 years ago

This has ben done and released in 0.17.0.