salesforce / mirus

Mirus is a cross data-center data replication tool for Apache Kafka
BSD 3-Clause "New" or "Revised" License
203 stars 43 forks source link

Default consumer offset reset policy - earliest #73

Open srolija opened 3 years ago

srolija commented 3 years ago

By default Kafka Consumer has auto.offset.reset policy configured to latest. But it looks that the implementation in this connector is reverse -- unless it is configured it will start with earliest.

https://github.com/salesforce/mirus/blob/4b482446400116c0d705949ec1e3c561c501f369/src/main/java/com/salesforce/mirus/MirusSourceTask.java#L161-L167

And that value is extracted from the consumer prefixed ones: https://github.com/salesforce/mirus/blob/4b482446400116c0d705949ec1e3c561c501f369/src/main/java/com/salesforce/mirus/config/SourceConfig.java#L51

Would it make sense to make the default the same as the normal connector to keep it consistent with normal consumer groups?

pdavidson100 commented 3 years ago

Thanks for your suggestion @srolija! The Mirus Source Connector is primarily intended for high-reliability data replication use cases, so our defaults are selected to minimize the risk of data loss at all costs — even at expense of increased risk of duplicate data. By using the earliest policy we guarantee that all available data is replicated, even in exceptional circumstances. This policy covers us when new topics are added to the topic regex and, importantly, also in rare instances where a Kafka bug causes the current offset to become invalid and the consumer offsets are reinitialized. This is something we have occasionally seen, particularly in earlier Kafka releases and using latest in that situation would certainly lead to data loss.

srolija commented 3 years ago

Completely get it.

Would it then make sense just to note the changes from the default consumer group? Asking since we had an issue where it started replicating enormous topic; and based on the docs we didn't understand that it has non-default configuration.

pdavidson100 commented 3 years ago

Yes, that does make sense - we should update the the docs.