termie / nova-migration-demo

Nova is a cloud computing fabric controller (the main part of an IaaS system). It is written in Python.
http://openstack.org/projects/compute/
Apache License 2.0
2 stars 0 forks source link

libvirt error handling is poor #562

Open termie opened 13 years ago

termie commented 13 years ago

The error handling in libvirt_conn is not great; there's a lot of catching of generic exceptions and assuming that any error is the error we think it's likely to be.

I hit this one hard when testing my 'don't erase all instances on host reboot patch'. For each instance, it called get_info, assuming that if this throws an exception that means the VM has been deleted. It then marks the instance deleted in the DB.

but libvirt wasn't started. (I don't know why, something was screwy with my upstart). Every get_info failed with a connection error, and every instance was removed. Not good.

I'm changing the generic "except:" handler to something like this:

try: virt_dom = self._conn.lookupByName(instancename) except libvirt.libvirtError as e: LOG.warning(("Error from libvirt during lookup: %s") % e) if e.get_error_code() == libvirt.VIR_ERR_UNKNOWNHOST: raise exception.NotFound(("Instance %s not found") % instance_name)

TODO(justinsb): How do I raise this and keep the stack trace?

        raise e

(Though I haven't tested that code yet)

We probably need to go through every catch block in libvirt_conn and do something similar. We could probably use a helper function here to make this less painful, though it might be less readable.


Imported from Launchpad using lp2gh.

termie commented 13 years ago

(by ewanmellor) Just "raise" without the "e" will reraise the current exception and keep the stack trace intact.

-----Original Message----- From: bounces@canonical.com [mailto:bounces@canonical.com] On Behalf Of justinsb Sent: 24 March 2011 05:04 To: Ewan Mellor Subject: [Bug 741468] [NEW] libvirt error handling is poor

Public bug reported:

The error handling in libvirt_conn is not great; there's a lot of catching of generic exceptions and assuming that any error is the error we think it's likely to be.

I hit this one hard when testing my 'don't erase all instances on host reboot patch'. For each instance, it called get_info, assuming that if this throws an exception that means the VM has been deleted. It then marks the instance deleted in the DB.

but libvirt wasn't started. (I don't know why, something was screwy with my upstart). Every get_info failed with a connection error, and every instance was removed. Not good.

I'm changing the generic "except:" handler to something like this:

try: virt_dom = self._conn.lookupByName(instancename) except libvirt.libvirtError as e: LOG.warning(("Error from libvirt during lookup: %s") % e) if e.get_error_code() == libvirt.VIR_ERR_UNKNOWNHOST: raise exception.NotFound(("Instance %s not found") % instance_name)

TODO(justinsb): How do I raise this and keep the stack

trace? raise e

(Though I haven't tested that code yet)

We probably need to go through every catch block in libvirt_conn and do something similar. We could probably use a helper function here to make this less painful, though it might be less readable.

\ Affects: nova Importance: Undecided Status: New

You received this bug notification because you are subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/741468

Title: libvirt error handling is poor

Status in OpenStack Compute (Nova): New

Bug description: The error handling in libvirt_conn is not great; there's a lot of catching of generic exceptions and assuming that any error is the error we think it's likely to be.

I hit this one hard when testing my 'don't erase all instances on host reboot patch'. For each instance, it called get_info, assuming that if this throws an exception that means the VM has been deleted. It then marks the instance deleted in the DB.

but libvirt wasn't started. (I don't know why, something was screwy with my upstart). Every get_info failed with a connection error, and every instance was removed. Not good.

I'm changing the generic "except:" handler to something like this:

try: virt_dom = self._conn.lookupByName(instancename) except libvirt.libvirtError as e: LOG.warning(("Error from libvirt during lookup: %s") % e) if e.get_error_code() == libvirt.VIR_ERR_UNKNOWNHOST: raise exception.NotFound(("Instance %s not found") % instance_name)

TODO(justinsb): How do I raise this and keep the stack

trace? raise e

(Though I haven't tested that code yet)

We probably need to go through every catch block in libvirt_conn and do something similar. We could probably use a helper function here to make this less painful, though it might be less readable.

termie commented 13 years ago

(by justin-fathomdb) Thanks Ewan. I've been told that eventlet breaks that though...?

I did find a 3-arg raise command, but it was a little bit too much Python for me to get my head around.

termie commented 13 years ago

(by blamar) From what I understand eventlet comes into play only if you explicitly call it during error handling (as in https://bugs.launchpad.net/glance/+bug/726742).

In this case I think the no-arg raise should work, since sys.exc_clear() isn't ever called from eventlet.

termie commented 13 years ago

(by rconradharris)

From what I understand eventlet comes into play only if you explicitly call it during error handling

Exactly right.

I did find a 3-arg raise command

I initially used a 3-arg raise in Glance to work around the problem. It worked beautifully, just one problem though, it's not pep-8 compat. After a few weeks of suffering though complaints, Jaypipes just made this change:

termie commented 13 years ago

(by justin-fathomdb) Thanks for the explanation all.

I always like to think of the future evolution of the code, so while this might be safe today:

except libvirt.libvirtError as e: LOG.warning(_("Error from libvirt during lookup: %s") % e) raise

...it sounds like if we plugged in a centralized logging solution that fired off HTTP messages using eventlet (or even wrote to files using eventlet?), that this would no longer be safe.

But having seen the alternative (thanks Rick & Brian for the info) I think maybe it's better just to live with the no-arg throw... If we ever plug in an logger that clears exceptions, we can probably put the responsibility for making sure that the exceptions aren't swallowed on that code.


For anyone reading this bug that's a little confused, this discussion was about my TODO note. The "catch-all" libvirt error handling is still problematic.