Closed SAshish19 closed 5 years ago
Hi @SAshish19, Kafka works slightly differently to IBM MQ in that when a message is read it is not removed from Kafka. It remains on the Kafka cluster for other consumers to read. The connector stores an offset in Kafka to track it's progress in reading messages. In the scenario you are describing if the connector goes down it should not have pushed the offset to Kafka yet. This means once the connector is back up and running it will re-read the message from Kafka and try to send it to IBM MQ again.
Does that answer your question?
Hi @katheris. Thanks for the response. When does the connector move the offset at Kafka end ?
1-Read message from Kafka Topic/Partition--->Push the offset at Kafka Topic/Partition since the read operation is complete---->Start pushing the message to MQ.
OR
2- Read the message from Kafka Topic/Partition---->Send the message to IBM MQ---->Once the message is delivered to MQ, Push the offset at Kafka Topic/partition
This is really a question of how Kafka Connect itself works. For a sink connector, it works in batches and the messages are flushed (committed) to MQ before the Kafka offset is updated. Messages could be duplicated in some retry situations, I'm sure (such as MQ commits, connector crashes before Kafka offsets updated, then restarts the same batch that's already in MQ).
Duplication of messages can be as hazardous as missing the messages. I guess this sink connector should be used carefully based on business scenarios that don't have negative impact because of duplication.
Thanks a lot for bringing up this point and clarifying the doubt.
Duplication of messages is always a possibility with Kafka Connect. It will always retry in preference to dropping messages and sometimes that can result in duplication.
Hello,
I was trying to understand behavior of this connector when it comes to message persistence. Let's take a case where the connector gets message from a KAFKA topic and as it is about to deliver the message to MQ, it goes down (the connector process gets killed), would we loose that message ? since the message was picked from KAFKA and it was not delivered to IBM MQ