jayjanssen / Percona-Pacemaker-Resource-Agents

Pacemaker HA Resource Agents by Percona
28 stars 9 forks source link

Need a way for PRM to work with xtrabackup's --safe-slave-backup option #3

Open jayjanssen opened 12 years ago

jayjanssen commented 12 years ago

http://www.percona.com/doc/percona-xtrabackup/innobackupex/innobackupex_option_reference.html#cmdoption-innobackupex--safe-slave-backup

Xtrabackup will stop the slave here repeatedly to get Slave_open_temp_tables to 0. This conflicts with PRM, which restarts the slave when it stops automatically. I can't find a way in pacemaker to cease monitoring on a given node, but leave the resources running.

lefred commented 12 years ago

Maybe you can modify the resource monitoring operation during the process by adding:

on-fail="ignore" ?

y-trudeau commented 12 years ago

The real best would be to monitor the xtrabackup pid and verify if indeed the pid is xtrabackup. In that case, if the slave is stopped, ignore the error. I don't see any option regarding pid-file for xtrabackup, maybe it is something that could be added. Does that make sense?

jayjanssen commented 12 years ago

Maybe, but it is quite complex and inter-dependent on a fix with xtrabackup. If there's a solution to simply mark a single node as in maintenance-mode without stopping the services, that would seem a handy thing to have in general for PRM. But, perhaps this is a pacemaker limitation.

jnewland commented 12 years ago

I've actually added a 'monitor_mysql' node attribute that, if present, causes the mysql monitoring check to return success. Combined with another 'xtrabackup' attribute that the xtrabackup script sets, and some location constraints, I can be sure that we don't elect a node that is running a backup as master.

On Aug 14, 2012, at 10:33 AM, Yves Trudeau notifications@github.com wrote:

The real best would be to monitor the xtrabackup pid and verify if indeed the pid is xtrabackup. In that case, if the slave is stopped, ignore the error. I don't see any option regarding pid-file for xtrabackup, maybe it is something that could be added. Does that make sense?

— Reply to this email directly or view it on GitHub.

y-trudeau commented 12 years ago

There're 2 ways of doing this, either the xtrabackup script set a crm attribute which need to be transient to avoid a pengine run or the mysql ocf script just react to the presence of a file. The xtrabackup script would touch a file that would basically prevent the monitor ops from restarting the slave (and set master_score to 0). It would then be the responsibility of the script to remove the file. Should we also set a timeout? I could stat the touched file and if now() - creation time is larger than the timeout, the file is ignored. Thoughts?