Closed jcourtois closed 9 years ago
Reproduced in Lab 02. Instances are stuck in 'creating' and 'deleting'.
this issue is related to Issue: https://github.com/rcbops/ansible-lxc-rpc/issues/99 and should be resolved in PR: https://github.com/rcbops/ansible-lxc-rpc/pull/101.
Testing latest deployment in IAD lab 1. The suite was cleaning up about 8 volumes very rapidly (and perhaps a minute or two after creating) and it triggered another freezing. :|
Seeing very similar issue, with an additional detail that I don't remember noticing before. If I try to manually delete any of my volumes using lvremove inside the cinder container, I get this:
root@573972-cinder01_cinder_volumes_container-7454dcdb:~# lvremove /dev/mapper/cinder--volumes-volume--73584646--91f4--4651--b3a6--f46ee352fe50
Do you really want to remove and DISCARD active logical volume volume-73584646-91f4-4651-b3a6-f46ee352fe50? [y/n]: y
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
device-mapper: remove ioctl on failed: Device or resource busy
Unable to deactivate cinder--volumes-volume--73584646--91f4--4651--b3a6--f46ee352fe50 (252:5)
Unable to deactivate logical volume "volume-73584646-91f4-4651-b3a6-f46ee352fe50"
Here are some logs from cinder-volumes.
Couple questions:
Alright, so the issue did resolve itself; whatever was locking up LVM let go. I added a few more lines to https://gist.github.com/jcourtois/dd49918a88e4d99cb323. As for your questions: -This is a new install with latest code branch -The deleting state for the seven or so volumes affected lasted about 25 minutes, after which they were all deleted within about a 1 minute period of time (about 5-10 seconds per volume to delete) -These were compute integration tests, so there were probably VMs attached, but I can't say. -Since the issue resolved, I can no longer say.
Testing is still underway. Since this resolved itself in a reasonable amount of time, I'll close this issue again. If it happens again I'll reopen.
This is likely simply a result of the volume having zeros written over it once the delete is executed. A process that does take time and creates a lock while zero'ing.
Let us know if this crops up again.
Of course it figures that when I stopped testing for the weekend, my last few cinder volumes would exhibit this behavior. I have 3 volumes that have been "deleting" since Saturday night.
Bonus: cinder-volumes has a stacktrace.
Can you execute another delete to the same volume and let us know if it succeeds. It seems that the volume was in a locked state.
Which volume/snapshot and using the cinder api or lvremove?
Root problem? From the kernel logs.
Sep 22 19:31:28 569058-cinder01 kernel: [ 12.570914] type=1400 audit(1411414288.192:137): apparmor="DENIED" operation="mount" info="failed type match" error=-13 profile="lxc-openstack" name="/run/cgmanager/fs/none,name=systemd/" pid=6385 comm="cgmanager" fstype="cgroup" srcname="none,name=systemd" flags="rw"
The issue appears to be reproduced again in the lab where we changed the change_profile parameter in /etc/apparmor.d/abstractions/lxc/start-container to 'unconfined'. :fallen_leaf:
Seeing this again in SAT6. In particular, after taking a snapshot of an LVM volume and deleting the snapshot, deleting the volume results in it getting stuck in the deleting stage.
@git-harry mentioned that this was a known issue in cinder. @git-harry does the gist above help you tracking down this issue?
Some additional info: https://gist.github.com/jameswthorne/62453bc79b9a9342acaf
This is going to be an upstream issue fix and is being tracked here: https://bugs.launchpad.net/cinder/+bug/1191960
@mancdaz @claco
Trying to delete a cinder volume in IAD, it goes from 'active' to 'deleting' but never makes it to 'deleted'. 12 hours later, looking at lvs and lvdisplay, it seems that the volume staged for deletion has not been deleted and is still sitting there in a suspended state. No stack traces noted.
https://gist.github.com/jcourtois/1470b0e24a14205eb592