Running lcmc while a node is starting can/might maybe lock the node/cluster

liedekef commented 11 years ago

I don't know if it is a clvmd issue or not, but I observed the following:

when rebooting a node (via fencing method, or just the reboot command), lcmc was showing offline just fine. But (I'm talking about a 2-node cluster with lots of disks and resources configured here), when the node came up, clvmd never finished. I got the dreaded hang in vgscan on the new node and no matter what we tried, we couldn't get it to work (after 10 reboots of that node without success, we were about to reboot the whole cluster). And of course: since vgscan hangs, also no gfs2 partitions are mounted etc ... It seems (not 100% sure) due to lvm commands on the remaining running node: clvmd seems to want to finish any lvm command before starting to accept connections from another node, but since lcmc was running, pvs commands were executed all the time (and they take some time). So I followed my hunch, stopped lcmc, fenced the node again and bingo: all is well ... I don't really point the finger to lcmc, but maybe the polling of disks is not needed until commands are executed that need those (on demand, like looking at the partition info, creating drbd things and such).

liedekef commented 11 years ago

Btw: I was able to reproduce the behaviour: running a loop of vgscan/pvs on the active node while rebooting the other one. Once cman was launched again on the rebooted node, vgscan/pvs commands hang on the active one. The only solution is to kill all commands on the active node and fence the other node again. lcmc is not the cause of the problem, but if you have many disks, it can trigger the same behaviour. I'm also going to open a bug at redhat for this.

rasto commented 11 years ago

I will disable the periodic LVM scans in LCMC. It will loose the ability to detect outside changes, but it isn't worth it.

liedekef commented 11 years ago

See here for the redhat bugreport concerning clvmd: https://bugzilla.redhat.com/show_bug.cgi?id=970192

rasto commented 11 years ago

I've disabled the periodic LVM scans in LCMC 1.5.5. It scans it once when it starts and then if you make LVM changes in the LCMC.

rasto / lcmc

Running lcmc while a node is starting can/might maybe lock the node/cluster #28