termie / nova-migration-demo

Nova is a cloud computing fabric controller (the main part of an IaaS system). It is written in Python.
http://openstack.org/projects/compute/
Apache License 2.0
2 stars 0 forks source link

nova-network sometimes crashes with bad state #245

Open termie opened 13 years ago

termie commented 13 years ago

We've run into a problem with nova-network (both with bzr655 and bzr669) where nova-network crashes with the following traceback. Upstart unhelpfully restarts it, which results in it dying again. I've put in a workaround that traps this error and skips the entry, which seems to right the system, after it works through the rabbitmq backlog that has built up. (in our case, it was 90K events, only half of which have been processed over the last 90 minutes.)

(nova.root): TRACE: Traceback (most recent call last): (nova.root): TRACE: File "/usr/bin/nova-network", line 44, in (nova.root): TRACE: service.serve() (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/service.py", line 231, in serve (nova.root): TRACE: x.start() (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/service.py", line 81, in start (nova.root): TRACE: self.manager.init_host() (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/network/manager.py", line 470, in init_host (nova.root): TRACE: super(VlanManager, self).init_host() (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/network/manager.py", line 129, in init_host (nova.root): TRACE: self._on_set_network_host(ctxt, network['id']) (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/network/manager.py", line 579, in _on_set_network_host (nova.root): TRACE: self.driver.update_dhcp(context, network_id) (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/network/linux_net.py", line 296, in update_dhcp (nova.root): TRACE: f.write(get_dhcp_hosts(context, network_id)) (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/network/linux_net.py", line 279, in get_dhcp_hosts (nova.root): TRACE: hosts.append(_host_dhcp(fixed_ip_ref)) (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/network/linux_net.py", line 369, in _host_dhcp (nova.root): TRACE: return "%s,%s.%s,%s" % (instance_ref['mac_address'], (nova.root): TRACE: TypeError: 'NoneType' object is unsubscriptable (nova.root): TRACE:


Imported from Launchpad using lp2gh.

termie commented 13 years ago

(by kost-isi) Hi,

I am also seeing this error running an ubuntu image or the ttylinux image using FlatDHCP...

2011-02-15 16:49:52,956 ERROR nova.root [-] Exception during message handling 550 (nova.root): TRACE: Traceback (most recent call last): 551 (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/rpc.py", line 192, in receive 552 (nova.root): TRACE: rval = node_func(context=ctxt, node_args) 553 (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/network/manager.py", line 418, in allocate_fixed_ip 554 (nova.root): TRACE: self.driver.update_dhcp(context, network_ref['id']) 555 (nova.root): TRACE: TypeError: 'NoneType' object is unsubscriptable 556 (nova.root): TRACE: 557 2011-02-15 16:49:52,957 ERROR nova.rpc [-] Returning exception 'NoneType' object is unsubscriptable to caller 558 2011-02-15 16:49:52,957 ERROR nova.rpc [-] ['Traceback (most recent call last):\n', ' File "/usr/lib/pymodules/python2.6/nova/rpc.py", li ne 192, in receive\n rval = node_func(context=ctxt, node_args)\n', ' File "/usr/lib/pymodules/python2.6/nova/network/manager.py", li ne 418, in allocate_fixed_ip\n self.driver.update_dhcp(context, network_ref[\'id\'])\n', "TypeError: 'NoneType' object is unsubscriptab le\n"]

termie commented 13 years ago

(by ttx) Apparently in some cases db.network_get_associated_fixed_ips returns a FixedIp with fixed_ip_ref['instance']=None... but I don't understand this code enough to tell if that's a normal use case that should be supported in get_dhcp_hosts, or if we should find the root cause.

termie commented 13 years ago

(by edina-varga) i have similiar

(nova.root): TRACE: Traceback (most recent call last): (nova.root): TRACE: File "/usr/bin/nova-network", line 44, in (nova.root): TRACE: service.serve() (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/service.py", line 231, in serve (nova.root): TRACE: x.start() (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/service.py", line 81, in start (nova.root): TRACE: self.manager.init_host() (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/network/manager.py", line 467, in init_host (nova.root): TRACE: super(VlanManager, self).init_host() (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/network/manager.py", line 125, in init_host (nova.root): TRACE: self._on_set_network_host(ctxt, network['id']) (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/network/manager.py", line 568, in _on_set_network_host (nova.root): TRACE: network_ref) (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/network/linux_net.py", line 169, in ensure_vlan_bridge (nova.root): TRACE: interface = ensure_vlan(vlan_num) (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/network/linux_net.py", line 178, in ensure_vlan (nova.root): TRACE: _execute("sudo vconfig set_name_type VLAN_PLUS_VID_NO_PAD") (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/network/linux_net.py", line 327, in _execute (nova.root): TRACE: return utils.execute(cmd, _args, *_kwargs) (nova.root): TRACE: File "/usr/lib/pymodules/python2.6/nova/utils.py", line 147, in execute (nova.root): TRACE: cmd=cmd) (nova.root): TRACE: ProcessExecutionError: Unexpected error while running command. (nova.root): TRACE: Command: sudo vconfig set_name_type VLAN_PLUS_VID_NO_PAD (nova.root): TRACE: Exit code: 1 (nova.root): TRACE: Stdout: '' (nova.root): TRACE: Stderr: 'sudo: no tty present and no askpass program specified\n'

termie commented 13 years ago

(by itoumsn) Hi Edina, folks,

I think the issue Edina reported is different from the original issue by Narayan and Kost. If 'requiretty' is set in /etc/sudores and nova-network was executed from init script on boot time, we'll see the message which Edina got:

"(nova.root): TRACE: Stderr: 'sudo: no tty present and no askpass program"

So, could you check your /etc/sudoers, Edina?

from man sudoers requiretty If set, sudo will only run when the user is logged in to a real tty. When this flag is set, sudo can only be run from a login session and not via other means such as cron(8) or cgi-bin scripts. This flag is off by default.

termie commented 13 years ago

(by ttx) Yes, the issue from Edina should be split to another bug, since it's not the same issue.

termie commented 13 years ago

(by itoumsn) Hi,

BTW, are these issues(Narayan's one and Kost one) still reproducible in trunk? Then, I would like to know more information.

Especially, I'm wondering if 'nova-manage create network' was executed successfully before starting nova-network and if so, fixed_range, num_networks and network_size. Also results of 'nova-manage network list' and 'nova-manage fixed list' assuming the issue is reproducible using trunk ppa...

termie commented 13 years ago

(by narayan-desai) I never had the ability to reproduce this bug on demand. It appeared that there was some bad content in either the network database or network-bound rabbitmq messages that would cause the code to traceback, that would eventually flush itself out.

I'm in the process of upgrading our version of nova now; that should be done in the next few days hopefully. We will see if it still occurs then. Though, even in that case, I wouldn't necessarily be convinced that the issue was gone. -nld

On Mon, Apr 4, 2011 at 12:52 PM, Masanori Itoh 719004@bugs.launchpad.net wrote:

Hi,

BTW, are these issues(Narayan's one and Kost one) still reproducible in trunk? Then, I would like to know more information.

Especially, I'm wondering if 'nova-manage create network' was executed successfully before starting nova-network and if so, fixed_range, num_networks and network_size. Also results of 'nova-manage network list' and 'nova-manage fixed list' assuming the issue is reproducible using trunk ppa...

You received this bug notification because you are a direct subscriber of the bug. https://bugs.launchpad.net/bugs/719004

Title:  nova-network crashes with bad data

To unsubscribe from this bug, go to: https://bugs.launchpad.net/nova/+bug/719004/+subscribe

termie commented 13 years ago

(by mark-msi) I just encountered this earlier tonight. It definitely is related to data in the networks table and fixed_ips not matching correctly. I've been screwing around with different network managers and deleting and recreating networks and this was the end result.

To fix it, I deleted everything from the networks and fixed_ips tables and then recreated the network with nova-manage and everything was fine.

Mark

termie commented 13 years ago

(by narayan-desai) That is interesting; when I had the problem, I added exception handling for the error (and ignored it) and eventually the system righted itself. I got the impression that there was some bad temporary state. -nld

On Wed, Apr 20, 2011 at 3:05 AM, Mark Nelson 719004@bugs.launchpad.net wrote:

I just encountered this earlier tonight.  It definitely is related to data in the networks table and fixed_ips not matching correctly.  I've been screwing around with different network managers and deleting and recreating networks and this was the end result.

To fix it, I deleted everything from the networks and fixed_ips tables and then recreated the network with nova-manage and everything was fine.

Mark

You received this bug notification because you are a direct subscriber of the bug. https://bugs.launchpad.net/bugs/719004

Title:  nova-network crashes with bad data

To unsubscribe from this bug, go to: https://bugs.launchpad.net/nova/+bug/719004/+subscribe