strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.82k stars 1.29k forks source link

How to mirror data without decompressing in Mirrormaker 2 #3963

Closed eazhilan-nagarajan closed 3 years ago

eazhilan-nagarajan commented 3 years ago

Hi,

Hope you're having a good day! Let me come straight to the question in hand. We have two Kafka clusters, active and a passive. Using MirrorMaker 2 we copy topic data from the active cluster to the passive one. This is all working fine.

FYI, the data in the topics are compressed while producing so we want the MirrorMaker 2 just to copy the compressed data across clusters without decompressing and compressing again.

- alias: my-passive-cluster
    authentication:
      passwordSecret:
        password: password
        secretName: passive-cluster-secret
      type: scram-sha-512
      username: user-1
    bootstrapServers: my-passive-cluster.com:443
    config:
      config.storage.replication.factor: 3
      offset.storage.replication.factor: 3
      status.storage.replication.factor: 3
      producer.compression.type: gzip

I used the above config which compressed the data produced at the target (passive) cluster but it was decompressed initially while mirroring.

Just wanted to know if there is something like a shallow compressing which will tell MirrorMaker 2 to just copy the data as such from source cluster topics without decompressing.

Thanks, Eazhilan

scholzj commented 3 years ago

So, do you know if this is supported in Mirror Maker 2 it self? Strimzi really just orchestrates MM2. So while the idea makes sense to me, if MM2 does not support it, it will need to be implemented there.

@ajborley @tombentley Any idea whether this is supported today in MM2?

eazhilan-nagarajan commented 3 years ago

So, do you know if this is supported in Mirror Maker 2 it self? Strimzi really just orchestrates MM2. So while the idea makes sense to me, if MM2 does not support it, it will need to be implemented there.

@ajborley @tombentley Any idea whether this is supported today in MM2?

Thanks for the reply. I found wiki page talking about the same for MirrorMaker 1. https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330#Kafkamirroring(MirrorMaker)-Shallowiterationandproducercompression(Kafka0.7)

Couldn't find any similar documentation for MM2 unfortunately. I also found couple of JIRA issues saying it's not implemented yet. (https://issues.apache.org/jira/browse/KAFKA-732, https://issues.apache.org/jira/browse/KAFKA-845)

scholzj commented 3 years ago

Interesting that it was basically removed from MM1.

If it is not supported by MM2, Strimzi would not change anything on it and it would first need to be implemented in Kafka it self. I'm afraid I do not know enough about MM2 and Kafka Connect internals to be able to say how easy or hard it would be.

tombentley commented 3 years ago

I don't believe it is supported. In fact I don't think the consumer API has a way to say "gimme the compressed bytes", which would be a prerequisite.

eazhilan-nagarajan commented 3 years ago

Ahh! Seems like then we have to cough up additional resource MM2 cluster to keep it going smoothly while we want the mirrored data to be compressed. While this is ok with me the main problem is the amount of data being transferred over the network is not going to be any less.

Thanks for the replies and giving me clarity on this!