termie / nova-migration-demo

Nova is a cloud computing fabric controller (the main part of an IaaS system). It is written in Python.
http://openstack.org/projects/compute/
Apache License 2.0
2 stars 0 forks source link

More graceful error needed when using flat mode and --flat_injected with incompatible guests #382

Open termie opened 13 years ago

termie commented 13 years ago

When I start an instance, the following error happened.

2010-11-21 21:31:51-0800 -: ERROR instance instance-2147483647: Failed to spawn 2010-11-21 21:31:51-0800 [-] Traceback (most recent call last): 2010-11-21 21:31:51-0800 [-] File "/usr/lib/pymodules/python2.6/nova/compute/manager.py", line 135, in run_instance 2010-11-21 21:31:51-0800 [-] yield self.driver.spawn(instance_ref) 2010-11-21 21:31:51-0800 [-] File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 891, in _inlineCallbacks 2010-11-21 21:31:51-0800 [-] result = result.throwExceptionIntoGenerator(g) 2010-11-21 21:31:51-0800 [-] File "/usr/lib/python2.6/dist-packages/twisted/python/failure.py", line 338, in throwExceptionIntoGenerator 2010-11-21 21:31:51-0800 [-] return g.throw(self.type, self.value, self.tb) 2010-11-21 21:31:51-0800 [-] File "/usr/lib/pymodules/python2.6/nova/virt/libvirt_conn.py", line 329, in spawn 2010-11-21 21:31:51-0800 [-] yield self._create_image(instance, xml) 2010-11-21 21:31:51-0800 [-] File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 891, in _inlineCallbacks 2010-11-21 21:31:51-0800 [-] result = result.throwExceptionIntoGenerator(g) 2010-11-21 21:31:51-0800 [-] File "/usr/lib/python2.6/dist-packages/twisted/python/failure.py", line 338, in throwExceptionIntoGenerator 2010-11-21 21:31:51-0800 [-] return g.throw(self.type, self.value, self.tb) 2010-11-21 21:31:51-0800 [-] File "/usr/lib/pymodules/python2.6/nova/virt/libvirt_conn.py", line 466, in _create_image 2010-11-21 21:31:51-0800 [-] execute=execute) 2010-11-21 21:31:51-0800 [-] File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 893, in _inlineCallbacks 2010-11-21 21:31:51-0800 [-] result = g.send(result) 2010-11-21 21:31:51-0800 [-] File "/usr/lib/pymodules/python2.6/nova/compute/disk.py", line 147, in inject_data 2010-11-21 21:31:51-0800 [-] yield _inject_net_into_fs(net, tmpdir, execute=execute) 2010-11-21 21:31:51-0800 [-] ProcessExecutionError: Unexpected error while running command. 2010-11-21 21:31:51-0800 [-] Command: sudo tee /var/lib/nova/tmp/tmpuaJWRG/etc/network/interfaces 2010-11-21 21:31:51-0800 [-] Exit code: 1 2010-11-21 21:31:51-0800 [-] Stdout: '# This file describes the network interfaces available on your system\n# and how to activate them. For more information, see inter faces(5).\n\n# The loopback network interface\nauto lo\niface lo inet loopback\n\n# The primary network interface\nauto eth0\niface eth0 inet static\n address 10 .0.0.2\n netmask 255.255.255.240\n broadcast 10.0.0.15\n gateway 10.0.0.1\n dns-nameservers 8.8.4.4\n\n\n' 2010-11-21 21:31:51-0800 [-] Stderr: 'tee: /var/lib/nova/tmp/tmpuaJWRG/etc/network/interfaces: No such file or directory\n'

After I saw the source, there may be something wrong with the source "/usr/lib/pymodules/python2.6/nova/compute/disk.py" . In method _inject_net_into_fs(), the source is as following.

def _inject_net_into_fs(net, fs, execute=None): netfile = os.path.join(os.path.join(os.path.join( fs, 'etc'), 'network'), 'interfaces') yield execute('sudo tee %s' % netfile, net)

Before tee command is executed, the folder was not created. So the error message "No such file or directory" was thrown. Therefore, it will be needed to create folder before tee.


Imported from Launchpad using lp2gh.

termie commented 13 years ago

(by guanxiaohua2k6) The version of nova is 2011.1~bzr397-0ubuntu0ppa1~maverick1.

Also I attached patch to fix the bug. Please confirm it.

termie commented 13 years ago

(by soren) I'm not entirely sure what to think about this.

If /etc/network doesn't exist, you're not dealing with a Debian based distro, so the injected network configuration won't work anyway. Just creating /etc/network might mask the problem, but you'll still end up with an instance you can't access.

termie commented 13 years ago

(by guanxiaohua2k6) Firstly this bug is related with https://bugs.launchpad.net/nova/+bug/678393.

After I installed nova on multiple machine, and created networks using "nova-mange network create ...", I failed with bug 678393. So I updated the column bridge of table networks manually. And I tried start an instance again, it failed with the messages as this bug described.

I have read the source of disk.py, it is clear that folder "/var/lib/nova/tmp/tmpuaJWRG/etc/network/" wasn't created in method _inject_net_into_fs(). So it caused the following command "tee" failed.

Contrast to _inject_net_into_fs(), in the method _inject_key_into_fs() just above, the corresponding folder is created before command tee is executed. As a reference, paste the code in following.

def _inject_key_into_fs(key, fs, execute=None): sshdir = os.path.join(os.path.join(fs, 'root'), '.ssh') yield execute('sudo mkdir -p %s' % sshdir) # existing dir doesn't matter yield execute('sudo chown root %s' % sshdir) yield execute('sudo chmod 700 %s' % sshdir) keyfile = os.path.join(sshdir, 'authorized_keys') yield execute('sudo tee -a %s' % keyfile, '\n' + key.strip() + '\n')

termie commented 13 years ago

(by soren) 2010/11/22 guan 678395@bugs.launchpad.net:

 Firstly this bug is related with https://bugs.launchpad.net/nova/+bug/678393.

After I installed nova on multiple machine, and created networks using "nova-mange network create ...", I failed with bug 678393. So I updated the column bridge of table networks manually. And I tried start an instance again, it failed with the messages as this bug described.

This is a completely separate issue. Let's keep them separate.

I have read the source of disk.py, it is clear that folder "/var/lib/nova/tmp/tmpuaJWRG/etc/network/" wasn't created in method _inject_net_into_fs(). So it caused the following command "tee" failed.

I understand.

Contrast to _inject_net_into_fs(), in the method _inject_key_into_fs() just above, the corresponding folder is created before command tee is executed. As a reference, paste the code in following.

I realise. The key injection code is compatible with every Linux distro that I know of. The location of SSH keys is widely agreed upon. The location and format of network configuration is not. It varies with the linux distro. If /etc/network doesn't already exist, it means that the image does not contain a Debian derived distribution, so injecting a network configuration that will only work on Debian derived distributions will not help you at all. So, I'm not sure what to do about this problem.

We can either create /etc/network first and inject a network configuration in there that will not work. Or we can just not attempt to write the network configuration if the directory doesn't exist. That will also remove the error, but will also result in a non-functional network.

The really short summary is that the network injection code is a (IMO dreadful) hack, and how to deal with its failure conditions isn't obvious. I'm not saying it shouldn't be handled. I just don't know how.

Soren Hansen Ubuntu Developer    http://www.ubuntu.com/ OpenStack Developer http://www.openstack.org/

termie commented 13 years ago

(by ttx) The combination "Flat mode + --flat_injected +guests not supporting /etc/network/" is not supported. I agree we could more gracefully fail in that case, but that won't make it magically work. If you use guests that don't support /etc/network/interfaces, you should probably run your network node with --flat_injected=false or use another network mode.

If you agree, I'll rename this bug so that it's about a more graceful error handling.