Open JamieSplitit opened 1 year ago
No it does not support log based replication currently. I had planned to add it, but the log in mongodb is a capped collection, meaning historical changes are purged as-to not grown unboundedly. This means, for us at least since our mongo logs did not go back far enough (not even close), it was a no value addition. Furthermore, even if they did, replication key based replication is significantly simpler and we mandate an updatedAt type of key on almost every collection.
To properly implement log based replication, you would ideally need to have a fallback method in the event you get the exception that your historical starting point is not found in the oplog. Given that error, you would need to start replicating the documents based on the document id (creation) timestamp and once fully caught up, could shift to log based. The added complexity for this is not on my radar though it may not be too bad. I am open to working with someone on a PR if a community member wants to contribute it/start it.
Will leave issue open.
Thanks, for the explanation.
In that case, can I suggest that REPLICATION_LOG_BASED
be removed from the line I quoted above in order to give the user a more explicit exception when LOG_BASED
is used for a stream with this tap in the config?
Cheers
Hey z3z1ma, I was loooking to use this tap for LOG_BASED replication as this line seems to indicate it is intended to be supported: https://github.com/z3z1ma/tap-mongodb/blob/ec26f07fe09696cb364ab8974fcb9f6820cce8cd/tap_mongodb/collection.py#L161
However, it seems that is not actually supported as every stream seems to require a replication key as far as I can tell? However, I may be missing something.
So - is this tap intended to support LOG_BASED replication? If so, how do I implement it in the meltano.yaml?
Thanks!