stripe-archive / mosql

MongoDB → PostgreSQL streaming replication
MIT License
1.63k stars 225 forks source link

Very large import, losing changes? #59

Closed AndreaCrotti closed 7 years ago

AndreaCrotti commented 10 years ago

I was looking at the code in streamer.rb and it made me wonder. Suppose you're tailing and the initial import takes a couple of days, considering the oplog is actually a capped collection, what happens if some of the changes get overwritten before it gets to reading the oplog?

It would seem to me that:

Would solve this issue, what do you think?

  @streamer.import

  unless options[:skip_tail]
    @streamer.optail
  end
nelhage commented 10 years ago

Yes, this is a potential issue, and MoSQL really should at least detect it and abort where possible.

I'd have to think pretty hard about what happens with updates and deletes if you tail in a different thread -- for instance, if you see an update to an object before you've imported it, what do you do?

Given that Mongo's own replication has this same property, I suspect we'd be better off just detecting and aborting, instead of trying to be too clever about it.

AndreaCrotti commented 10 years ago

Yeah it's not an easy problem, it would be nice at least to have an idea about how big the oplog could become for that to be an issue, and for how many documents more or less it might become to be an issue..

nelhage commented 10 years ago

One thing you can look at is the output of db.getReplicationInfo() in the mongo shell; That will give you statistics on the length of time preserved in the oplog. MoSQL needs to be able to do the initial import in that amount of time or less in order to not lose data.