olopez32 / ganeti

Automatically exported from code.google.com/p/ganeti
0 stars 0 forks source link

--maintain-node-health does not work on two node cluster #745

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What software version are you running? Please provide the output of "gnt-
cluster --version", "gnt-cluster version", and "hspace --version".

gnt-cluster (ganeti v2.9.5) 2.9.5

Software version: 2.9.5
Internode protocol: 2090000
Configuration format: 2090000
OS api version: 20
Export interface: 0
VCS version: v2.9.5

What distribution are you using?

ubuntu 12.04.4 LTS

What steps will reproduce the problem?
1. on a two node cluster enable --maintain-node-health
2. set second node offline

What is the expected output? What do you see instead?

ganeti-watcher should stop instances and disable drbd devices.

Please provide any additional information below.

ganeti-watcher just shows a warning:

2014-02-25 10:42:23,062: ganeti-watcher pid=17690 nodemaint:146 WARNING 
Inconsistent replies, not doing anything

i also get this on a second two node cluster with same os and ganeti versions.

see discussion -> https://groups.google.com/forum/#!topic/ganeti/oSqvQJ8oysg

Original issue reported on code.google.com by hei...@googlemail.com on 4 Mar 2014 at 7:16

GoogleCodeExporter commented 9 years ago
I just tried this on a newly-installed test cluster with 2.9.5 and I was unable 
to reproduce: no WARNING is shown in the logs.

Could you whether it is visible on a newly-installed cluster and report the 
exact sequence of steps to reproduce the bug?

Original comment by mtart...@google.com on 4 Mar 2014 at 12:38

GoogleCodeExporter commented 9 years ago
i'm sorry but both are production clusters and i cannot re-create a cluster on 
those machines.

if there is anything else i can do to debug this issue let me know.

is you test cluster a two node cluster? did you try to run ganeti-watcher -d on 
the offline node? this is what i did to get this warning message.

Original comment by hei...@googlemail.com on 4 Mar 2014 at 2:03

GoogleCodeExporter commented 9 years ago
Ah! I hadn't realized that the error message was supposed to appear on the node 
marked as offline!

Yes, the warning is indeed visible in watcher.log on the offline node, and 
running "xm list" on that node (on a xen cluster) shows that the instance is 
indeed still running.

Bug confirmed, thanks for the report.

Original comment by mtart...@google.com on 4 Mar 2014 at 2:14