hpgrahsl / kafka-connect-mongodb

**Unofficial / Community** Kafka Connect MongoDB Sink Connector -> integrated 2019 into the official MongoDB Kafka Connector here: https://www.mongodb.com/kafka-connector
Apache License 2.0
153 stars 60 forks source link

Post Processor Chain not working #135

Closed moaazabir closed 2 years ago

moaazabir commented 2 years ago

Hello @hpgrahsl,

I was trying kafka-connect-mongodb with post processor chain and below is the configuration I have used

{
  "name": "mdb-sink-debezium-cdc",
  "config": {
        "connector.class": "at.grahsl.kafka.connect.mongodb.MongoDbSinkConnector",

        "topics": "customers",
        "mongodb.connection.uri": "mongodb://xxx:27017/inventory?authSource=admin&w=1&journal=true",
        "mongodb.database": "inventory",
        "mongodb.change.data.capture.handler.operations" : "c,r,u,d",
        "name": "mdb-sink-debezium-cdc",
       "mongodb.collection": "customers",
        "key.converter": "io.confluent.connect.avro.AvroConverter",
        "key.converter.schema.registry.url": "http://kubernetes.docker.internal:8081",
         "value.converter": "io.confluent.connect.avro.AvroConverter",
        "value.converter.schema.registry.url": "http://kubernetes.docker.internal:8081",
        "mongodb.change.data.capture.handler": "at.grahsl.kafka.connect.mongodb.cdc.debezium.rdbms.RdbmsHandler" ,

         "mongodb.document.id.strategy": "at.grahsl.kafka.connect.mongodb.processor.id.strategy.PartialValueStrategy",

            "mongodb.post.processor.chain":"at.grahsl.kafka.connect.mongodb.processor.DocumentIdAdder,at.grahsl.kafka.connect.mongodb.processor.BlacklistKeyProjector,at.grahsl.kafka.connect.mongodb.processor.BlacklistValueProjector",

            "mongodb.key.projection.type":"blacklist",
            "mongodb.key.projection.list":"first_name",
            "mongodb.value.projection.type":"blacklist",
            "mongodb.value.projection.list":"first_name",

         "errors.log.enable": true

    }
}

even though I have used both BlacklistKeyProjector and BlacklistValueProjector, I am still getting first_name in mongodb. I have even tried renamer but it also has no effect. Is there something I am missing out?

hpgrahsl commented 2 years ago

Hi @moaazabir,

thanks for reaching out and the interest to use the mongodb sink connector. From what I can see in your configuration you are running the connector in CDC mode "mongodb.change.data.capture.handler": "at.grahsl.kafka.connect.mongodb.cdc.debezium.rdbms.RdbmsHandler". With this configuration option, it's not supported to define and apply a post processor chain because CDC events are intended to be processed in a 1:1 fashion because this is the most common form of dealing with CDC payloads in the wild.

That said, if you really want to remove field from the CDC payload you would need to configure SMTs (single message transforms) which are allowing to pre-process the records before they are handed to the sink connector itself. There are several turnkey ready SMTs you can choose from and the one you might want to apply is the following https://docs.confluent.io/platform/current/connect/transforms/replacefield.html#replacefield

Hope this helps you achieve your requirements.

Finally please note that this project has been discontinued 3 years ago because it was integrated into the official mongodb connector for apache kafka. This means if you have questions or want to report issues please do that in the official repository right here https://github.com/mongodb/mongo-kafka Thx for your consideration and all the best with for your use case / project.

moaazabir commented 2 years ago

Hello @hpgrahsl,

Thanks for the information . It really helped me to understand the CDC connectors better. Also I got really good information from your older comments on other issues.

hpgrahsl commented 2 years ago

You're welcome! Then I think it's fine if I'll close this for now but feel free to ask further questions in the official repo https://github.com/mongodb/mongo-kafka any time.