ibm-messaging / kafka-connect-mq-sink

This repository contains a Kafka Connect sink connector for copying data from Apache Kafka into IBM MQ.
Apache License 2.0
35 stars 43 forks source link

Message persistence behavior #18

Closed SAshish19 closed 5 years ago

SAshish19 commented 5 years ago

Hello,

I was trying to understand behavior of this connector when it comes to message persistence. Let's take a case where the connector gets message from a KAFKA topic and as it is about to deliver the message to MQ, it goes down (the connector process gets killed), would we loose that message ? since the message was picked from KAFKA and it was not delivered to IBM MQ

katheris commented 5 years ago

Hi @SAshish19, Kafka works slightly differently to IBM MQ in that when a message is read it is not removed from Kafka. It remains on the Kafka cluster for other consumers to read. The connector stores an offset in Kafka to track it's progress in reading messages. In the scenario you are describing if the connector goes down it should not have pushed the offset to Kafka yet. This means once the connector is back up and running it will re-read the message from Kafka and try to send it to IBM MQ again.

Does that answer your question?

SAshish19 commented 5 years ago

Hi @katheris. Thanks for the response. When does the connector move the offset at Kafka end ?

1-Read message from Kafka Topic/Partition--->Push the offset at Kafka Topic/Partition since the read operation is complete---->Start pushing the message to MQ.
OR 2- Read the message from Kafka Topic/Partition---->Send the message to IBM MQ---->Once the message is delivered to MQ, Push the offset at Kafka Topic/partition

AndrewJSchofield commented 5 years ago

This is really a question of how Kafka Connect itself works. For a sink connector, it works in batches and the messages are flushed (committed) to MQ before the Kafka offset is updated. Messages could be duplicated in some retry situations, I'm sure (such as MQ commits, connector crashes before Kafka offsets updated, then restarts the same batch that's already in MQ).

SAshish19 commented 5 years ago

Duplication of messages can be as hazardous as missing the messages. I guess this sink connector should be used carefully based on business scenarios that don't have negative impact because of duplication.

Thanks a lot for bringing up this point and clarifying the doubt.

AndrewJSchofield commented 5 years ago

Duplication of messages is always a possibility with Kafka Connect. It will always retry in preference to dropping messages and sometimes that can result in duplication.