termie / nova-migration-demo

Nova is a cloud computing fabric controller (the main part of an IaaS system). It is written in Python.
http://openstack.org/projects/compute/
Apache License 2.0
2 stars 0 forks source link

Rebooting instance doesn't restore mounted volume #319

Open termie opened 13 years ago

termie commented 13 years ago

Tested on Revision No 925.

Steps to reproduce:- 1) Run one VM instance 2) Attach volume to the VM instance 3) SSH to the VM instance, mount the volume and logout from SSH 4) reboot the VM instance 5) Again SSH to the VM instance and try to mount the volume. It doesn't allow and gives the error message {{{ Could not stat /dev/vdb --- No such file or directory

The device apparently does not exist; did you specify it correctly? }}}

6) euca-describe-volumes still shows that the volume is attached to the VM instance and is in use. {{{ root@ubuntu-openstack-single-server:/home/tpatil# euca-describe-volumes VOLUME vol-00000001 1 nova in-use (admin, ubuntu-openstack-single-server, i-00000002[ubuntu-openstack-single-server], \/dev\/vdb) 2011-04-02T00:48:20Z }}}

7) If I try to detach the volume, then it gives error message in the nova-compute.log

nova-compute.log

{{{ 2011-04-01 17:59:04,743 ERROR nova [-] Exception during message handling (nova): TRACE: Traceback (most recent call last): (nova): TRACE: File "/home/tpatil/nova/nova/rpc.py", line 190, in _receive (nova): TRACE: rval = node_func(context=ctxt, _node_args) (nova): TRACE: File "/home/tpatil/nova/nova/exception.py", line 120, in _wrap (nova): TRACE: return f(_args, _kw) (nova): TRACE: File "/home/tpatil/nova/nova/compute/manager.py", line 105, in decorated_function (nova): TRACE: function(self, context, instance_id, args, _kwargs) (nova): TRACE: File "/home/tpatil/nova/nova/compute/manager.py", line 779, in detach_volume (nova): TRACE: volume_ref['mountpoint']) (nova): TRACE: File "/home/tpatil/nova/nova/exception.py", line 120, in _wrap (nova): TRACE: return f(_args, *_kw) (nova): TRACE: File "/home/tpatil/nova/nova/virt/libvirt_conn.py", line 405, in detachvolume (nova): TRACE: raise exception.NotFound(("No disk at %s") % mount_device) (nova): TRACE: NotFound: No disk at vdb (nova): TRACE: }}}


Imported from Launchpad using lp2gh.

termie commented 13 years ago

(by itoumsn) Hi Tushar,

You use KVM on Ubuntu, right? Also, you rebooted the instance on the guest OS? I mean, not using euca-reboot-instances.

I'm feeling that the root cause of this issue would be a KVM problem. If this issue is reproducible, please check collect information before and after rebooting instances on compute node.

virsh dumpxml VM_NAME

I guess the device you attached to your VM vanished from VM configuration after guest OS reboot. In the case, what we can do anyway would be logging the exeption and cleaning up database, I think.

-Masanori

termie commented 13 years ago

(by itoumsn) BTW, if you used euca-reboot-instances on KVM based systems, the issue gets back to nova side. At this moment, libvirt does not support rebooting KVM instances, and the current implementation of RebootInstance is like the following.

trunk/nova/virt/libvirt_conn.py 473 def reboot(self, instance): 474 self.destroy(instance, False) # DESTROY ONCE 475 xml = self.to_xml(instance)

One idea could be calling virsh dumpxml to the instance to be rebooted and updating the above xml here.

476         self.firewall_driver.setup_basic_filtering(instance)
477         self.firewall_driver.prepare_instance_filter(instance)
478         self._conn.createXML(xml, 0)    # CREATE AGAIN, AND THERE IS NO CODE TO RE-ATTACH EBSs.
479         self.firewall_driver.apply_instance_filter(instance)
480
481         timer = utils.LoopingCall(f=None)
482
483         def _wait_for_reboot():
484             try:
485                 state = self.get_info(instance['name'])['state']
486                 db.instance_set_state(context.get_admin_context(),
487                                       instance['id'], state)
488                 if state == power_state.RUNNING:
489                     LOG.debug(_('instance %s: rebooted'), instance['name'])
490                     timer.stop()
491             except Exception, exn:
492                 LOG.exception(_('_wait_for_reboot failed: %s'), exn)
493                 db.instance_set_state(context.get_admin_context(),
494                                       instance['id'],
495                                       power_state.SHUTDOWN)
496                 timer.stop()
497
498         timer.f = _wait_for_reboot
499         return timer.start(interval=0.5, now=True)

-Masanori

termie commented 13 years ago

(by itoumsn) Hi,

I wrote an ultimately STUPID workaround fix on this issue and linked my branch here. Actually, this issue is a problematic one if we want to resolve it in an elegant way, I think.

Anyway, it looks working on my Ubuntu 10.10 box at least.

Tushar, if you are testing volume driver other than ISCSI or multiple multi-node nova installation, could you try the patch below?

lp:~itoumsn/nova/lp747922

Thanks,

termie commented 13 years ago

(by itoumsn) I will volunteer as the assignee of this issue until someone with more elegant resolution appears...

-Masanori

termie commented 13 years ago

(by tpatil) I tested using your branch lp:~itoumsn/nova/lp747922 and it seems to be working as expected now.

After rebooting the instance, I see the volume is still attached and I can see all files intact.

Thank you.

termie commented 13 years ago

(by itoumsn) Hi Tushar,

Thanks for testing. :) I will post a merge request soon after the cactus release.

Thanks, Masanori