Closed CBR09 closed 7 months ago
OK - will take a look. I have not yet had this experience and have always gotten the (misspelled) "Unable to get Exlusive Lock" error. Code does try some cleanups after certain operations, perhaps it is being too aggressive in cleaning up mount/lock after a failed operation, or there could be other issues or a regression.
@CBR09 Thanks for report, I was able to reproduce - looking into why it is being so aggressive on cleanup and to a possible fix.
It appears that the docker daemon (or possibly volume helper lib) is calling API Unmount after a failed Mount. I'm not sure when this change occurred, but I will have to change the way Unmount operates, since I always assume you really want to unlock and unmount when you call that method.
Yes, I think you maybe check at Unmount function: if rbd image have lock execusive, do nothing. I really want if rbd image was used by a container, no one can use that image again.
@CBR09 unfortunately it is not that simple.
The Unmount api method is being called - this means that docker engine wants me to unmount, unmap and unlock the image in question. This is called even when the lock is exclusive, as it always is the case.
I'm not sure how to tell this case apart from the legitimate case of calling Unmount, so will have to do more research into a possible fix. I think the reason I have never run into this is 1) I never ran the test you provided (three containers on same node trying to use same device) since 2) I run in a cluster with hundreds of nodes and rarely is job spin up on same node right away.
@porcupie. Yes, thanks for all you done. I've tried run multiple containers use same device (same or different node) to check lock mechanism. I want to safe when use rbd-docker-plugin in my environment.
When I ran containerA and containerB use same device at same node, I notice although rbd-docker-plugin unmount and unlock image, but containerA seem to work as normal (read/write device). I not sure how issue affect this situation?
Not-related question: Have you use rbd-docker-plugin in your production?, I looing for a suitable plugin for my env.
Thanks
Looks like Docker 1.10 has added an ID
parameter to Mount and Unmount API calls, so I should be able to use those to keep track of mapping mount and unmount per rbd image and per-client usage (?)
Not-related question: Have you use rbd-docker-plugin in your production?, I looing for a suitable plugin for my env.
@CBR09 We are using this plugin for some Proof of Concept workloads (small low-use databases, Dockins Jenkins servers that need to save storage config to disk, etc) but nothing that could be considered production scale.
We do always explicitly stop the singleton containers that use RBD before spinning them up again, that is simply part of the Operations playbook for those services. The locking is kept as an emergency fallback just in case, but we try not to rely on this. We had issue previously with go-ceph implementation of locking via the C api, so we have always been a bit wary of this feature.
Looks like Docker 1.10 has added an ID parameter to Mount and Unmount API calls, so I should be able to use those to keep track of mapping mount and unmount per rbd image.
Glad to hear that. I hope you can implement and release it!
We are using this plugin for some Proof of Concept workloads (small low-use databases, Dockins Jenkins servers that need to save storage config to disk, etc) but nothing that could be considered production scale.
We do always explicitly stop the singleton containers that use RBD before spinning them up again, that is simply part of the Operations playbook for those services.
Thanks for your sharing.
@CBR09 Hello - made a small check to see if rbd unmap returning 'exit status 16' (tested on Centos 7.2 with docker 1.9 and docker 1.11) and if so to bail early in the duplicated Unmount calls. Perhaps this can be a workaround for now?
Please try v1.2.2 and let me know if this fixes the bug on your side. Thanks
@porcupie : Thanks for your release, I will check it as soon as possible.
@porcupie Sorry for my late, I tried it today and it work Thanks you so much.
Hi @porcupie Today, I checked rbd-docker-plugin again, and still have some issue: umount still happen
root@ceph:~# rbd showmapped id pool image snap device
3 docker foo - /dev/rbd3root@ceph:~# cat /proc/self/mountinfo | grep rbd 57 22 249:0 / /var/lib/docker-volumes/rbd/docker/foo rw,relatime - xfs /dev/rbd3 rw,attr2,inode64,sunit=8192,swidth=8192,noquota root@ceph:~# docker run -it --volume-driver=rbd --volume foo:/mnt/foo ceph/base bash docker: Error response from daemon: VolumeDriver.Mount: Unable to get Exclusive Lock. root@ceph:~# rbd showmapped id pool image snap device
3 docker foo - /dev/rbd3 root@ceph:~# cat /proc/self/mountinfo | grep rbd root@ceph:~#
Logs
2016/06/16 09:59:06 driver.go:349: INFO: API Mount({foo map[]}) 2016/06/16 09:59:06 driver.go:735: DEBUG: parseImagePoolNameSize: "foo": ["foo" "" "" "foo" "" ""] 2016/06/16 09:59:06 driver.go:1171: DEBUG: sh CMD: &{"/usr/bin/rbd" ["rbd" "--pool" "docker" "--conf" "/etc/ceph/ceph.conf" "--id" "docker" "info" "foo"] [] ""
[] %!q(_syscall.SysProcAttr= ) %!q(_os.Process= ) " " %!q(bool=false) [] [] [] [] %!q(chan error= )} 2016/06/16 09:59:06 driver.go:1171: DEBUG: sh CMD: &{"/usr/bin/rbd" ["rbd" "--pool" "docker" "--conf" "/etc/ceph/ceph.conf" "--id" "docker" "lock" "add" "foo" "ceph"] [] "" [] %!q(_syscall.SysProcAttr= ) %!q(_os.Process= ) " " %!q(bool=false) [] [] [] [] %!q(chan error= )} 2016/06/16 09:59:06 driver.go:394: ERROR: locking RBD Image(foo): exit status 16 2016/06/16 09:59:07 driver.go:555: INFO: API Unmount({foo map[]}) 2016/06/16 09:59:07 driver.go:735: DEBUG: parseImagePoolNameSize: "foo": ["foo" "" "" "foo" "" ""] 2016/06/16 09:59:07 driver.go:1171: DEBUG: sh CMD: &{"/bin/umount" ["umount" "/dev/rbd/docker/foo"] [] "" [] %!q(_syscall.SysProcAttr= ) %!q(_os.Process= ) " " %!q(bool=false) [] [] [] [] %!q(chan error= )} 2016/06/16 09:59:07 driver.go:1171: DEBUG: sh CMD: &{"/usr/bin/rbd" ["rbd" "--conf" "/etc/ceph/ceph.conf" "--id" "docker" "unmap" "/dev/rbd/docker/foo"] [] "" [] %!q(_syscall.SysProcAttr= ) %!q(_os.Process= ) " " %!q(bool=false) [] [] [] [] %!q(chan error= )} 2016/06/16 09:59:07 driver.go:607: ERROR: unmapping image device(/dev/rbd/docker/foo): exit status 16 2016/06/16 09:59:07 driver.go:611: WARN: unmap failed due to busy device, early exit from this Unmount request.
Yes, as the CHANGELOG.md and the code itself mentions, this is unavoidable situation at the moment.
The system umount will return success even if the device is still busy. The only way we were able to hack this workaround is by checking error message from rbd unmap, which must happen after the unmount. We are not able to necessarily re-mount to "fix" since we lack the necessary information at the time. In addition, in my limited tests, the container was still fine and able to read and write to the mounted disk inside the container as seen also with df
from inside container.
Will leave issue open in case we find a better fix.
Yes, thanks @porcupie , I will leave this open until have a better solution.
Hi @porcupie and everyone. I created container with rbd test image, rbd-docker-plugin also created lock exclusive for test image, when I tried to create container with same rbd test image, docker told me "Unable to get Exlusive Lock", I've tried again, docker told me "Unable to mount device.", everything seem fine but I saw rbd-docker-plugin also unmount and remove lock exclusive. Is it a bug?
First, create container with rbd test image, check lock list and mount point
Switch terminal and tried to create container with same rbd test image two time, I got different logs, so confuse
And I didn't see lock exclusive anymore
rbd-docker-plugin log:
I think rbd-docker-plugin should check device have a lock exclusive or when got "locking RBD Image(test): exit status 16" error, rbd-docker-plugin should be stop and notify to docker daemon.
Thanks