olopez32 / ganeti

Automatically exported from code.google.com/p/ganeti
0 stars 0 forks source link

Node randomly stops responding once instance is started #12

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I was able to get Ganeti installed on a pair of nodes under Debian.  The
systems will boot, xen starts up and the instance will start on the master
node.  The drbd subsystem kept the nodes in sync.  I could move the master
and the instance without problem.

Recently the master node "disappears", but only after the instance is started.

I start up the instance, it boots up properly.  Some time shortly after the
node starts up, the entire system stops responding to any remote traffic. 
It cannot connect to anything remote, nor can any external machines connect
to it via pings/etc.

The network interfaces don't appear to change.  Everyone still has their
addresses, the bridge is up and running and even the packet counts on the
eth0 device continue to increment as though packets are coming in and out,
but none of the remote machines see anything.

Checking the iptables, there is a rule for the bridge device (inserted by
ganeti), and nothing else.

Any suggestions?  Anyone seen this before?

Original issue reported on code.google.com by Craig.Ho...@gmail.com on 9 Nov 2007 at 10:34

GoogleCodeExporter commented 9 years ago
No, we didn't see it before. However, since ganeti itself doesn't touch 
iptables, I
think what happens is that: a) you defined an ip address for the instance and 
b) xen
uses that to modify iptables in a way that breaks networking.

If a) is true, can you please check that removing the ip address of the 
instance from
ganeti fixes the problem? (gnt-instance modify -i none my_instance)

If a) is not true, could you show the iptables rule?

thanks,
iustin

Original comment by iust...@gmail.com on 9 Nov 2007 at 11:25

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Unfortunately, the "gnt-instance modify -i none my_instance" didn't fix things.

I'm attaching the kernel log so you can see where things went wrong.  It covers 
from
the boot point up until where the system's network goes away.

I started the instance, the drbd file systems between the two physical nodes 
begins
syncing and then everything stops responding at:

Nov 18 11:50:53 vpn1 kernel: device vif1.0 entered promiscuous mode

vpn1 is the master node, vpn2 is the slave.

I'm going to completely re-install and see if things work a little better next 
time...

Original comment by Craig.Ho...@gmail.com on 18 Nov 2007 at 5:42

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
The iptables rule (via iptables -L -n) that shows up on vpn1 after starting the
vpnclient instance is:

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     0    --  0.0.0.0/0            0.0.0.0/0           PHYSDEV match
--physdev-in vif1.0 

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Original comment by Craig.Ho...@gmail.com on 18 Nov 2007 at 5:53

GoogleCodeExporter commented 9 years ago
What is your node network interface ? ?
The BNX2 module has problems with Xen using bridge. 
When using Bnx2 version prior to 1.5c, and you start xend using network-script
bridge, the network stop responding. The node completely lost it network. This
problem is related to bnx2 driver and xen, the solution is to install one more 
recent
driver. 

Here i have documentation about it in PT_BR, but it can help, just follow the 
commands:
http://guialivre.governoeletronico.gov.br/mediawiki/index.php/DocumentacaoRedeBn
x2

Original comment by gnu...@gmail.com on 11 Dec 2007 at 5:57

GoogleCodeExporter commented 9 years ago
Hi,

Is this still a problem?

regards,
iustin

Original comment by iust...@gmail.com on 22 Jan 2008 at 12:50

GoogleCodeExporter commented 9 years ago
Closing bug as there has been no reponse on it. Feel free to re-open if it's 
still an
issue.

Original comment by iust...@gmail.com on 3 Jul 2008 at 6:23