In prod, mongo primary replica set member changes

mdutoo commented 9 years ago

(Brought back by @silently on 20150323)

Can be checked in mongo client by doing rs.conf().

This is not a problem for the Datacore, since its Java mongodb driver will handle failover and use the right new primary: http://stackoverflow.com/questions/21841064/mongodb-java-client-automatic-failover-failing http://docs.mongodb.org/ecosystem/drivers/java-replica-set-semantics/

Though it can cause problems in clients that don't support failover or are badly configured, such as when using robomongo to list collections:

5:12:26 PM: nextSafe(): { $err: "Invalid ns [.$cmd]", code: 16256 }

Quick solution if you want to revert it anyway: on new primary in mongo client do rs.stepDown(). Usually the "right" primary will be elected, probably because the only other node is farther away (BV).

The probable cause is taking VM snapshots to backup data causes micro-cuts that trigger a new primary election, however VM snapshots should not be done on primary (done for now, confirmed by @silently).

TODO

@silently tell IPGarde not to snapshot primary anymore. The only downside is to have to copy (scp) dbs below /var/lib/mongodb rather than merely replace the VM in order to restore the primary ; the rest ex. app can be easily redeployed.
setup alert when primary changes, to (handle failure and) change it back, so that backup will still be done on primary. Seems possible using MMS: "Primary Elected" https://docs.mms.mongodb.com/reference/alerts/ AND / OR adding mongo check in Datacore /status.

(this is a good backup method, however long term speaking, we'll still have to think about (probably adding) a better one ex. MMS...)

silently commented 9 years ago

I would like @tbroyer 's feedback regarding "tell IPGarde not to snapshot primary anymore". As discussed by email, it's fine with me.

tbroyer commented 9 years ago

+cc @jpoittevin

jpoittevin commented 9 years ago

I also think it's a good idea to backup a secondary node. I don't really know the snapshot technology used by IPGarde to do the veeam snapshots but it probably induce a, even slight, overhead.

silently commented 9 years ago

perfect everyone agrees!

silently commented 9 years ago

@mdutoo let say the primary M1 really goes down and the (backed-up) secondary M2 becomes primary. If we're not notified that the primary M1 is down, during the next M2 back-up, M3 could be elected as primary but is in the other data center.

So I would prefer to have the alert on M1 before asking IPGarde to remove the backup.

jpoittevin commented 9 years ago

could the backup process launch a pre-script to step down the node to backup a few seconds before the backup to ensure the node supposed to be the primary is really the primary during the backup process ?

mdutoo commented 9 years ago

@silently I'll let you ask @jpoittevin 's idea to IPGarde. But even if it's possible, backup would fail if the to be backupped node is down. Which is solved by improving pre-script idea such: let's backup all node, but abort it before it starts if the pre-script says that it is primary. So as long as there's at least a secondary, we'll have a backup.

But if that can't be done, merely backupping a given secondary (or both) is not worse than today, where primary is backupped everytime (not only in some error cases). And since most changes of primary are likely caused by VM snapshotting-induced network micro-cuts, that would even be much better.

And all these solutions would again be improved if we're alerted of changes of primary, by allowing us to change it back, in addition to do error analysis. As I've said, it can be done by MMS.

jpoittevin commented 9 years ago

mmmhh, I also suggest to leave nodes up :P

silently commented 9 years ago

Ok I am going to ask IPgarde about the script thing, but as I've said, I think having the monitoring (MMS or whatever) is important before establishing conditional backups.

tbroyer commented 9 years ago

In light of recent issues on the Kernel, and absence of a dedicated and skilled (no offense intended) ops team, may I bring back the idea of using a PaaS? (more details on the Kernel issues by mail later today)

silently commented 9 years ago

added to the agenda of next Monday's meeting

silently commented 9 years ago

for the follow-up the machine holding the primary is not backed up anymore, we still have needs regarding monitoring/alerting.

mdutoo commented 8 years ago

Also makes puppet deployment on the original primary fail, since it expects it to still be primary, though the deployment succeeds and is only prevented to start again. So I've just rs.stepDown()'d the errroneous primary so that puppet deployment is OK this time.

mdutoo commented 8 years ago

The best way is probably to always snapshot only the same replica and prevent it from every becoming master this way: https://docs.mongodb.com/v2.6/tutorial/configure-secondary-only-replica-set-member/

This snapshotted replica should probably be the Bonneville one, because it is a good thing to prevent it from becoming master, because if it happened it would slow down the whole replica set, being the only Datacore replica in its Datacenter.

ozwillo / ozwillo-datacore

In prod, mongo primary replica set member changes #57