Open srolija opened 3 years ago
Thanks for your suggestion @srolija! The Mirus Source Connector is primarily intended for high-reliability data replication use cases, so our defaults are selected to minimize the risk of data loss at all costs — even at expense of increased risk of duplicate data. By using the earliest
policy we guarantee that all available data is replicated, even in exceptional circumstances. This policy covers us when new topics are added to the topic regex and, importantly, also in rare instances where a Kafka bug causes the current offset to become invalid and the consumer offsets are reinitialized. This is something we have occasionally seen, particularly in earlier Kafka releases and using latest
in that situation would certainly lead to data loss.
Completely get it.
Would it then make sense just to note the changes from the default consumer group? Asking since we had an issue where it started replicating enormous topic; and based on the docs we didn't understand that it has non-default configuration.
Yes, that does make sense - we should update the the docs.
By default Kafka Consumer has
auto.offset.reset
policy configured to latest. But it looks that the implementation in this connector is reverse -- unless it is configured it will start with earliest.https://github.com/salesforce/mirus/blob/4b482446400116c0d705949ec1e3c561c501f369/src/main/java/com/salesforce/mirus/MirusSourceTask.java#L161-L167
And that value is extracted from the consumer prefixed ones: https://github.com/salesforce/mirus/blob/4b482446400116c0d705949ec1e3c561c501f369/src/main/java/com/salesforce/mirus/config/SourceConfig.java#L51
Would it make sense to make the default the same as the normal connector to keep it consistent with normal consumer groups?