z3z1ma / tap-mongodb

Tap for mongodb emphasizing simplicity and leveraging Meltano SDK
MIT License
10 stars 4 forks source link

LOG_BASED replication seemingly allowed but not supported #2

Open JamieSplitit opened 1 year ago

JamieSplitit commented 1 year ago

Hey z3z1ma, I was loooking to use this tap for LOG_BASED replication as this line seems to indicate it is intended to be supported: https://github.com/z3z1ma/tap-mongodb/blob/ec26f07fe09696cb364ab8974fcb9f6820cce8cd/tap_mongodb/collection.py#L161

However, it seems that is not actually supported as every stream seems to require a replication key as far as I can tell? However, I may be missing something.

So - is this tap intended to support LOG_BASED replication? If so, how do I implement it in the meltano.yaml?

Thanks!

z3z1ma commented 1 year ago

No it does not support log based replication currently. I had planned to add it, but the log in mongodb is a capped collection, meaning historical changes are purged as-to not grown unboundedly. This means, for us at least since our mongo logs did not go back far enough (not even close), it was a no value addition. Furthermore, even if they did, replication key based replication is significantly simpler and we mandate an updatedAt type of key on almost every collection.

To properly implement log based replication, you would ideally need to have a fallback method in the event you get the exception that your historical starting point is not found in the oplog. Given that error, you would need to start replicating the documents based on the document id (creation) timestamp and once fully caught up, could shift to log based. The added complexity for this is not on my radar though it may not be too bad. I am open to working with someone on a PR if a community member wants to contribute it/start it.

Will leave issue open.

JamieSplitit commented 1 year ago

Thanks, for the explanation.

In that case, can I suggest that REPLICATION_LOG_BASED be removed from the line I quoted above in order to give the user a more explicit exception when LOG_BASED is used for a stream with this tap in the config?

Cheers