Closed AndreaCrotti closed 10 years ago
Because the problem is that the id in Mongo is a UUID not just a long auto incrementing, I would rather not use that as Primary Key in Postgres.
The alternative I suppose could be that if something fails I do a --reimport, or if use tail-from with the timestamp when it stopped, does it make sense?
I noticed now this
WARN MoSQL: `mongodb://django:django@localhost:27017/' is not a replset.
WARN MoSQL: Will run the initial import, then stop.
WARN MoSQL: Pass `--skip-tail' to suppress this warning
So maybe thinking that the two things are correlated.. So if it's not a replset then it will not tail? To test it locally I need to create a replica set then?
MoSQL needs a node configured as --replSet
in order to tail. You don't actually need a multiple-node cluster, but MoSQL relies on reading the MongoDB oplog in order to replicate data, which isn't written out unless the Mongo server is configured as a replset node.
MoSQL requires that your collection map include a mapping for the _id
field in order to properly replicate changes and to de-duplicate identical records. Since _id
is the primary key on the MongoDB side, we can't tell whether a record is a duplicate or a new record unless we also preserve the _id
field onto the Postgres side
Ok thanks now I understood how it works, for production we have a replica set I just have to configure one also locally to test it..
About the _id yes fair enough makes sense, but the question is, can I map the MongoDB id to a field that is not the primary key in PostgreSQL? Would it still be able to de-duplicate identical records?
Because I would really like to avoid having a primary key an UUID, Django doesn't like so much if you mess around with the primary keys..
I did some experiment myself and unfortunately I think the mapping does not work if it's not the primary key.
I just added this mapping
- couch_id:
:source: _id
:type: UUID
And couch_id to the Postgres model, but if I import twice it still duplicates the rows. In theory now however it could work, because it has all the information needed to understand that the row has already been inserted right?
By the way I found out the trick, the field I map to does not have to be a primary key as long it's unique and not null.
Maybe this would be nice to be documented somewhere since it's not so evident otherwise, thanks!
Great, glad it's working out! I'll close this for now; I'd probably consider a PR adding documentation, but I won't likely get around to writing any just yet.
Why if I run the command in this way
mosql -c comments.yaml --mongo mongodb://django:django@localhost:27017/ --sql postgres://django:django@localhost/garage -v -v -v --no-drop-tables
It quits when it's done importing? Should it not run forever reading the oplog?
Is it not possible to let it run forever tailing the oplog (until stopped)?
If I run the migration with a schema as below multiple times, my records get duplicated.
This is maybe because I am not specifying an ID so I'm letting Postgres create new records with new ids. But is it not possible to make it only create new record if not there already in another way?
I just want to make sure that if I have to run it twice I don't get duplicate data..
Thanks