hung on unmount operation

jdef commented 8 years ago

I created a task in Marathon, watched it come up, let it run for a few min, then suspended the app (sets instances to zero). I ended up with a hung unmount op on the slave the task was running on. The task appears as KILLED in mesos. I have a single slave node cluster. Now I can't launch a new task trying to use a (different) rexray vol. Attempting to do so results in a task stuck in STAGING. So this hung unmount op is blocking somewhere.

logs:

Apr 02 19:23:11 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:11.981403  1134 slave.cpp:3528] executor(1)@10.0.0.249:53756 exited
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.070981  1136 containerizer.cpp:1608] Executor for container '05ca518e-1fe1-4fee-bc2f-abfbb0cb9fa9' has exited
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.071041  1136 containerizer.cpp:1392] Destroying container '05ca518e-1fe1-4fee-bc2f-abfbb0cb9fa9'
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.072430  1133 cgroups.cpp:2427] Freezing cgroup /sys/fs/cgroup/freezer/mesos/05ca518e-1fe1-4fee-bc2f-abfbb0cb9fa9
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.073606  1132 cgroups.cpp:1409] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/05ca518e-1fe1-4fee-bc2f-abfbb0cb9fa9
 after 1.126912ms
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.074779  1139 cgroups.cpp:2445] Thawing cgroup /sys/fs/cgroup/freezer/mesos/05ca518e-1fe1-4fee-bc2f-abfbb0cb9fa9
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.075850  1139 cgroups.cpp:1438] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/05ca518e-1fe1-4fee-bc2f-abfbb0cb9f
a9 after 1.031936ms
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.076619  1133 docker_volume_driver_isolator.cpp:356] rexray/test is being unmounted on cleanup()
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.078660  1133 docker_volume_driver_isolator.cpp:362] Invoking /opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --vo
lumename=test
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:12Z" level=info msg=vdm.Create driverName=docker moduleName=default-docker opts=map[] volumeName=test
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:12Z" level=info msg="initialized count" count=0 moduleName=default-docker volumeName=test
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:12Z" level=info msg="creating volume" driverName=docker moduleName=default-docker volumeName=test volumeOpts=map[]
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:12Z" level=info msg=sdm.GetVolume driverName=ec2 moduleName=default-docker volumeID= volumeName=test
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=vdm.Unmount driverName=docker moduleName=default-docker volumeID= volumeName=test
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg="initialized count" count=0 moduleName=default-docker volumeName=test
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg="unmounting volume" driverName=docker moduleName=default-docker volumeID= volumeName=test
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=sdm.GetVolume driverName=ec2 moduleName=default-docker volumeID= volumeName=test
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=sdm.GetVolumeAttach driverName=ec2 instanceID=i-8677785c moduleName=default-docker volumeID=vol-23fe
679a
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=odm.GetMounts deviceName="/dev/xvdc" driverName=linux moduleName=default-docker mountPoint=
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=odm.Unmount driverName=linux moduleName=default-docker mountPoint="/var/lib/rexray/volumes/test"
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=sdm.DetachVolume driverName=ec2 instanceID= moduleName=default-docker runAsync=false volumeID=vol-23
fe679a
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg="waiting for volume detachment to complete" driverName=ec2 force=false moduleName=default-docker run
Async=false volumeID=vol-23fe679a
Apr 02 19:23:14 ip-10-0-0-249.us-west-2.compute.internal kernel: vbd vbd-51744: 16 Device in use; refusing to close

/proc/mounts:

core@ip-10-0-0-249 ~ $ cat /proc/mounts
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,nosuid,size=7688856k,nr_inodes=1922214,mode=755 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0
tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
/dev/xvda9 / ext4 rw,relatime,data=ordered 0 0
/dev/xvda3 /usr ext4 ro,relatime 0 0
mqueue /dev/mqueue mqueue rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
tmpfs /media tmpfs rw,nosuid,nodev,noexec,relatime 0 0
systemd-1 /boot autofs rw,relatime,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct 0 0
tmpfs /tmp tmpfs rw 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime 0 0
xenfs /proc/xen xenfs rw,relatime 0 0
/dev/xvda6 /usr/share/oem ext4 rw,nodev,relatime,commit=600,data=ordered 0 0
/dev/xvda1 /boot vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro 0 0
/dev/xvdb /var/lib ext4 rw,relatime,data=ordered 0 0
/dev/xvdc /tmp/test-rexray-volume ext4 rw,relatime,data=ordered 0 0

jdef commented 8 years ago

forcibly killing the dvdcli unmount process appears to unblock things, but is (obviously) not an ideal solution.

jdef commented 8 years ago

... and this appears to confuse the system, which now has mounted two devices to the same point in the filesystem:

/dev/xvda6 /usr/share/oem ext4 rw,nodev,relatime,commit=600,data=ordered 0 0
/dev/xvda1 /boot vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro 0 0
/dev/xvdb /var/lib ext4 rw,relatime,data=ordered 0 0
/dev/xvdc /tmp/test-rexray-volume ext4 rw,relatime,data=ordered 0 0
/dev/xvdd /var/lib/rexray/volumes/jdef-test-vol ext4 rw,relatime,data=ordered 0 0
/dev/xvdd /tmp/test-rexray-volume ext4 rw,relatime,data=ordered 0 0

jdef commented 8 years ago

second volume was created with these specs (should have been a new vol since it was not preexisting):

I0402 20:11:02.853814  1135 docker_volume_driver_isolator.cpp:524] DVDI_VOLUME_CONTAINERPATH(/tmp/test-rexray-volume) parsed from environment
I0402 20:11:02.853844  1135 docker_volume_driver_isolator.cpp:524] DVDI_VOLUME_NAME(jdef-test-vol) parsed from environment
I0402 20:11:02.853869  1135 docker_volume_driver_isolator.cpp:524] DVDI_VOLUME_DRIVER(rexray) parsed from environment
I0402 20:11:02.853893  1135 docker_volume_driver_isolator.cpp:524] DVDI_VOLUME_OPTS(size=100) parsed from environment

jdef commented 8 years ago

rexray config:

core@ip-10-0-0-249 ~ $ cat /etc/rexray/config.yml 

rexray:
  loglevel: info
  storageDrivers:
    - ec2
  volume:
    mount:
      preempt: true
    unmount:
      ignoreusedcount: true

clintkitson commented 8 years ago

@jdef I believe the 16 Device in use; refusing to close message is the key here. The mesos-module-dvdi does accounting to determine when it is appropriate to unmount a volume from the system. The purpose here is if there are multiple containers sharing a volume, then it is the modules responsibility to track this and ensure an unmount (host level operation) does not occur until all container are done using the volume.

The message in the bottom of the log is showing that there is something that is still using the device that is trying to be detached. Is there any chance that the device and/or mount path is being accessed by another process on the system? Can you capture the results of lsof filtering for the mounted paths?

jdef commented 8 years ago

@clintonskitson lsof results:

core@ip-10-0-0-142 ~ $ sudo lsof -V /tmp/test-rexray-volume
lsof: no file system use located: /tmp/test-rexray-volume
core@ip-10-0-0-142 ~ $ cat /proc/mounts|grep -e rexray
/dev/xvdc /tmp/test-rexray-volume ext4 rw,relatime,data=ordered 0 0
core@ip-10-0-0-142 ~ $ sudo lsof -V /dev/xvdc              
lsof: no file system use located: /dev/xvdc
core@ip-10-0-0-142 ~ $ ls /tmp/test-rexray-volume
hello
core@ip-10-0-0-142 ~ $ sudo lsof -V /tmp/test-rexray-volume/hello
lsof: no file use located: /tmp/test-rexray-volume/hello

clintkitson commented 8 years ago

@jdef Can you perform something like lsof -V | grep test-rexray-volume?

jdef commented 8 years ago

comes back empty

On Mon, Apr 4, 2016 at 4:11 PM, Clinton Kitson notifications@github.com wrote:

@jdef https://github.com/jdef Can you perform something like lsof -V | grep test-rexray-volume?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205475070

James DeFelice 585.241.9488 (voice) 650.649.6071 (fax)

jdef commented 8 years ago

@clintonskitson i'm going to try with an explicit mount under /etc to see if that makes a difference

clintkitson commented 8 years ago

If it is empty, can you try an unmount of the path?

jdef commented 8 years ago

appears to succeed:

core@ip-10-0-0-142 ~ $ sudo umount /tmp/test-rexray-volume core@ip-10-0-0-142 ~ $ cat /proc/mounts |grep -e rexray core@ip-10-0-0-142 ~ $

On Mon, Apr 4, 2016 at 4:14 PM, Clinton Kitson notifications@github.com wrote:

If it is empty, can you try an unmount of the path?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205475876

James DeFelice 585.241.9488 (voice) 650.649.6071 (fax)

clintkitson commented 8 years ago

Can you replicate the problem where it hung before? It would have hung because there were open files. Technically, since the container that has the volume mounted and is using the files is done, there should not be any files open. I am curious to see if you could do a lsof during this time to see why the mount process is hanging.

jdef commented 8 years ago

well, i just tried to launch a new task with a new volume instance but rexray is hanging on trying to mount it. this is the same slave that I just manually ran the umount command on.

Apr 04 20:23:25 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:25Z" level=info msg=vdm.Mount driverName=docker moduleName=default
-docker newFsType= overwriteFs=false preempt=false volumeID= volumeName=jdef-test-vol13115
Apr 04 20:23:25 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:25Z" level=info msg="mounting volume" driverName=docker moduleName
=default-docker newFsType= overwriteFs=false volumeID= volumeName=jdef-test-vol13115
Apr 04 20:23:25 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:25Z" level=info msg=sdm.GetVolume driverName=ec2 moduleName=defaul
t-docker volumeID= volumeName=jdef-test-vol13115
Apr 04 20:23:27 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:26Z" level=info msg=sdm.GetVolumeAttach driverName=ec2 instanceID=
i-25bff2fd moduleName=default-docker volumeID=vol-9be4fb28
Apr 04 20:23:27 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:27Z" level=info msg=odm.Unmount driverName=linux moduleName=defaul
t-docker mountPoint="//var/lib/rexray/volumes/jdef-test-vol13115"
Apr 04 20:23:27 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:27Z" level=info msg=sdm.AttachVolume driverName=ec2 force=true ins
tanceID=i-25bff2fd moduleName=default-docker runAsync=false volumeID=vol-9be4fb28
Apr 04 20:23:27 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:27Z" level=info msg="got next device name" driverName=ec2 moduleNa
me=default-docker nextDeviceName="/dev/xvdc"
Apr 04 20:23:27 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:27Z" level=info msg="waiting for volume attachment to complete" dr
iverName=ec2 force=true instanceID=i-25bff2fd moduleName=default-docker runAsync=false volumeID=vol-9be4fb28

the error I produced today was on a completely new cluster vs the error I first reported. so it can be reproduced. the containers i'm spinning up aren't touching any files in the volume, so i'm not sure what would be holding a handle to any files on it.

clintkitson commented 8 years ago

@jdef Waiting for volume attachment to complete looks to be a EC2 based problem and might align to a bug they have in their Ubuntu OS image with the EBS driver. Can you reboot the instances and try again?

jdef commented 8 years ago

FWIW this is coreos, not Ubuntu. I've rebooted the slave. Once rebooted, the task launched just fine (tail -f /dev/null).

lsof -V did not show anything accessing the volume filesystem

i killed the task. and dvdcli unmount is hanging again trying to unmount the volume. running lsof -V still shows that no one is accessing the volume filesystem

jdef commented 8 years ago

core@ip-10-0-0-142 ~ $ uname -a
Linux ip-10-0-0-142.us-west-2.compute.internal 4.1.7-coreos-r1 #2 SMP Thu Nov 5 02:10:23 UTC 2015 x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz GenuineIntel GNU/Linux

jdef commented 8 years ago

to be clear, once dvdcli unmount hangs further task launches (using ext vols, or not) are all blocked in TASK_STAGING and eventually time out.

clintkitson commented 8 years ago

What is the coreos rev or AMI? I will try and reproduce.

On Monday, April 4, 2016, James DeFelice notifications@github.com wrote:

to be clear, once dvdcli unmount hangs further task launches (using ext vols, or not) are all blocked in TASK_STAGING and eventually time out.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205510246

jdef commented 8 years ago

looks like this AMI CoreOS-stable-766.5.0-hvm

clintkitson commented 8 years ago

Can you review the devices that show up under /dev/disk/by-id?

Curious if something is claiming for block devices as they appear.

On Monday, April 4, 2016, James DeFelice notifications@github.com wrote:

looks like this AMI CoreOS-stable-766.5.0-hvm

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205624383

jdef commented 8 years ago

core@ip-10-0-0-142 ~ $ ls -laF /dev/disk/by-uuid/ total 0 drwxr-xr-x 2 root root 160 Apr 4 21:11 ./ drwxr-xr-x 7 root root 140 Apr 4 20:38 ../ lrwxrwxrwx 1 root root 11 Apr 4 20:38 01d6bc7e-0ab0-49f0-9e24-23ff5fc001c9 -> ../../xvda9 lrwxrwxrwx 1 root root 11 Apr 4 20:38 0321af0d-eff8-4f69-9f05-9f9f7bd3c7cf -> ../../xvda3 lrwxrwxrwx 1 root root 10 Apr 4 20:38 15797eed-93bd-43ab-a7ad-19e7e1fa0b31 -> ../../xvdb lrwxrwxrwx 1 root root 10 Apr 4 21:11 6e2bac7e-8f12-428b-9a4a-38b599573190 -> ../../xvdc lrwxrwxrwx 1 root root 11 Apr 4 20:38 CBC4-C7B4 -> ../../xvda1 lrwxrwxrwx 1 root root 11 Apr 4 20:38 ba2d8241-3460-4b9c-88d1-ffc493349f66 -> ../../xvda6

On Mon, Apr 4, 2016 at 11:34 PM, Clinton Kitson notifications@github.com wrote:

Can you review the devices that show up under /dev/disk/by-id?

Curious if something is claiming for block devices as they appear.

On Monday, April 4, 2016, James DeFelice notifications@github.com wrote:

looks like this AMI CoreOS-stable-766.5.0-hvm

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub < https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205624383

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205624793

James DeFelice 585.241.9488 (voice) 650.649.6071 (fax)

clintkitson commented 8 years ago

Ok looks normal, thanks.

On Monday, April 4, 2016, James DeFelice notifications@github.com wrote:

core@ip-10-0-0-142 ~ $ ls -laF /dev/disk/by-uuid/ total 0 drwxr-xr-x 2 root root 160 Apr 4 21:11 ./ drwxr-xr-x 7 root root 140 Apr 4 20:38 ../ lrwxrwxrwx 1 root root 11 Apr 4 20:38 01d6bc7e-0ab0-49f0-9e24-23ff5fc001c9 -> ../../xvda9 lrwxrwxrwx 1 root root 11 Apr 4 20:38 0321af0d-eff8-4f69-9f05-9f9f7bd3c7cf -> ../../xvda3 lrwxrwxrwx 1 root root 10 Apr 4 20:38 15797eed-93bd-43ab-a7ad-19e7e1fa0b31 -> ../../xvdb lrwxrwxrwx 1 root root 10 Apr 4 21:11 6e2bac7e-8f12-428b-9a4a-38b599573190 -> ../../xvdc lrwxrwxrwx 1 root root 11 Apr 4 20:38 CBC4-C7B4 -> ../../xvda1 lrwxrwxrwx 1 root root 11 Apr 4 20:38 ba2d8241-3460-4b9c-88d1-ffc493349f66 -> ../../xvda6

On Mon, Apr 4, 2016 at 11:34 PM, Clinton Kitson <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Can you review the devices that show up under /dev/disk/by-id?

Curious if something is claiming for block devices as they appear.

On Monday, April 4, 2016, James DeFelice <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

looks like this AMI CoreOS-stable-766.5.0-hvm

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub <

https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205624383

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub < https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205624793

James DeFelice 585.241.9488 (voice) 650.649.6071 (fax)

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205625914

jdef commented 8 years ago

so, it actually doesn't block indefinitely... the job I tried to start about 6 hours ago finally started about 20m ago. slave logs have this to say:

E0405 03:18:47.355356  3293 shell.hpp:93] Command '/opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --volumename=jdef-test-vol13115 ' failed; this is the output:
W0405 03:18:47.355484  3293 docker_volume_driver_isolator.cpp:386] /opt/mesosphere/bin/dvdcli unmount failed to execute on cleanup(), continuing on the assumption this volume was manually unmounted previously Failed to execute '/opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --volumename=jdef-test-vol13115 '; the command was either not found or exited with a non-zero exit status: 1
I0405 03:18:47.355914  3293 containerizer.cpp:666] Starting container 'bf8b669c-9623-43c2-9345-bf895f77059f' for executor 'adasdasdasdasd.e6874945-faad-11e5-bf0b-161c5a4ef132' of framework '15c5b4d7-9ae8-40f6-91a5-4bf40cb8c791-0000'
I0405 03:18:47.357314  3293 containerizer.cpp:1392] Destroying container 'bf8b669c-9623-43c2-9345-bf895f77059f'

immediately after this log output, my (formerly) blocked non-volume-using task was finally launched (actually it experienced a TASK_FAILED because it had started so long ago and marathon had since killed it, but ... it finally caught up to the world. marathon was then able to launch it again and that worked).

jdef commented 8 years ago

rexray had something to say about the same time:

Apr 05 03:18:47 ip-10-0-0-142.us-west-2.compute.internal rexray[5498]: time="2016-04-05T03:18:47Z" level=error msg="/VolumeDriver.Unmount: error unmounting volume" error="AWS was not able to validate the provided 
access credentials (AuthFailure)"
Apr 05 03:18:47 ip-10-0-0-142.us-west-2.compute.internal mesos-slave[3175]: time="2016-04-05T03:18:47Z" level=error msg="Plugin Error: VolumeDriver.Unmount, {\"Error\":\"AWS was not able to validate the provided a
ccess credentials (AuthFailure)\"}\n"
Apr 05 03:18:47 ip-10-0-0-142.us-west-2.compute.internal rexray[5498]: time="2016-04-05T03:18:47Z" level=error msg="AWS was not able to validate the provided access credentials (AuthFailure)"
Apr 05 03:18:47 ip-10-0-0-142.us-west-2.compute.internal mesos-slave[3175]: E0405 03:18:47.355356  3293 shell.hpp:93] Command '/opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --volumename=jdef-test-vol131
15 ' failed; this is the output:
Apr 05 03:18:47 ip-10-0-0-142.us-west-2.compute.internal mesos-slave[3175]: W0405 03:18:47.355484  3293 docker_volume_driver_isolator.cpp:386] /opt/mesosphere/bin/dvdcli unmount failed to execute on cleanup(), con
tinuing on the assumption this volume was manually unmounted previously Failed to execute '/opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --volumename=jdef-test-vol13115 '; the command was either not fou
nd or exited with a non-zero exit status: 1

clintkitson commented 8 years ago

@jdef With a fresh system can you try and leverage the dvdcli commands manually and check the results?

Please replace rexray and test with appropriate names for your config. This should simulate the most basic usage.

/opt/bin/dvdcli mount --volumedriver=rexray --volumename=test /opt/bin/dvdcli unmount --volumedriver=rexray --volumename=test

clintkitson commented 8 years ago

@jdef Can you also paste in the marathon application definition that you are using?

jdef commented 8 years ago

Apparently we're running a forked version of rexray that tries to use IAM credentials instead of credentials coded into the configuration file. Upon further review it seems likely that the library code responsible for EC2 auth is failing to refresh credentials upon token expiration. That said, it's not clear to me that the credentials issue is directly related to hanging unmount commands, especially when I terminate a task so closely to when it's created -- given how easy it is to reproduce the issue, I'm betting that the credentials have not yet expired.

The marathon spec I'm using depends on a development branch of marathon that's adding higher level support for external volumes (so this is subject to change prior to being merged into master):

{
  "id": "hello",
  "instances": 1,
  "cpus": 0.1,
  "mem": 32,
  "cmd": "/usr/bin/tail -f /dev/null",
  "container": {
    "type": "MESOS",
    "volumes": [
      {
        "containerPath": "/tmp/test-rexray-volume",
        "persistent": {
          "size": 100,
          "name": "jdef-test-vol13115",
          "provider": "external",
          "options": { "external/driver": "rexray" }
          },
        "mode": "RW"
      }
    ]
  },
  "upgradeStrategy": {
    "minimumHealthCapacity": 0,
    "maximumOverCapacity": 0
  }
}

To test the mount/unmount commands manually I'll need to spin up a fresh cluster and execute the commands before the initially detected IAM creds expire.

jdef commented 8 years ago

when I manually run the dvdcli mount/unmount command they work just fine with the IAM fixes @branden incorporated. bumped into #89 and that's stalling further testing.

jdef commented 8 years ago

hm... IAM problems have been resolved and the hung unmount problem persists:

Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.814273  1542 slave.cpp:3528] executor(1)@10.0.1.57:47184 exited
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.910516  1547 containerizer.cpp:1608] Executor for container '903b69ea-9656-44dc-bae3-a106bc954f9f' has exited
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.910591  1547 containerizer.cpp:1392] Destroying container '903b69ea-9656-44dc-bae3-a106bc954f9f'
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.911882  1541 cgroups.cpp:2427] Freezing cgroup /sys/fs/cgroup/freezer/mesos/903b69ea-9656-44dc-bae3-a106bc954f9f
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.912988  1547 cgroups.cpp:1409] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/903b69ea-9656-44dc-bae3-a106bc954f9f 
after 1.054976ms
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.914213  1542 cgroups.cpp:2445] Thawing cgroup /sys/fs/cgroup/freezer/mesos/903b69ea-9656-44dc-bae3-a106bc954f9f
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.915292  1547 cgroups.cpp:1438] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/903b69ea-9656-44dc-bae3-a106bc954f9
f after 1.035008ms
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.915989  1545 docker_volume_driver_isolator.cpp:357] rexray/jdef-test-vol6249824 is being unmounted on cleanup()
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.917794  1545 docker_volume_driver_isolator.cpp:363] Invoking /opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --vol
umename=jdef-test-vol6249824
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:31Z" level=info msg=vdm.Create driverName=docker moduleName=default-docker opts=map[] volumeName=jdef-test-vol6249824
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:31Z" level=info msg="initialized count" count=0 moduleName=default-docker volumeName=jdef-test-vol6249824
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:31Z" level=info msg="creating volume" driverName=docker moduleName=default-docker volumeName=jdef-test-vol6249824 volumeO
pts=map[]
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg=sdm.GetVolume driverName=ec2 moduleName=default-docker volumeID= volumeName=jdef-test-vol6249824
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg=vdm.Unmount driverName=docker moduleName=default-docker volumeID= volumeName=jdef-test-vol6249824
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg="initialized count" count=0 moduleName=default-docker volumeName=jdef-test-vol6249824
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg="unmounting volume" driverName=docker moduleName=default-docker volumeID= volumeName=jdef-test-vol624
9824
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg=sdm.GetVolume driverName=ec2 moduleName=default-docker volumeID= volumeName=jdef-test-vol6249824
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg=sdm.GetVolumeAttach driverName=ec2 instanceID=i-3e6f7ae4 moduleName=default-docker volumeID=vol-9482e
22d
Apr 07 02:59:33 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:33Z" level=info msg=odm.GetMounts deviceName="/dev/xvdc" driverName=linux moduleName=default-docker mountPoint=
Apr 07 02:59:33 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:33Z" level=info msg=odm.Unmount driverName=linux moduleName=default-docker mountPoint="/var/lib/rexray/volumes/jdef-test-
vol6249824"
Apr 07 02:59:33 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:33Z" level=info msg=sdm.DetachVolume driverName=ec2 instanceID= moduleName=default-docker runAsync=false volumeID=vol-948
2e22d
Apr 07 02:59:33 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:33Z" level=info msg="waiting for volume detachment to complete" driverName=ec2 force=false moduleName=default-docker runA
sync=false volumeID=vol-9482e22d
Apr 07 02:59:33 ip-10-0-1-57.us-west-2.compute.internal kernel: vbd vbd-51744: 16 Device in use; refusing to close

jdef commented 8 years ago

using filesystem/linux isolator in conjunction with this isolator fixes the problem. without the filesystem/linux isolator the host mountns is contaminated with the mounts added to the container. core team says that the hosts /tmp was being polluted by the container w/o the filesystem/linux isolator

jdef commented 8 years ago

I've left this issue open because this dependency on the linux/filesystem isolator should be documented

clintkitson commented 8 years ago

@jdef Doing some more testing on our side. To me this doesn't make sense that this would be required.

clintkitson commented 8 years ago

@jdef So the troubling thing here is that I believe I simulated relevant options by supplying DVDI_VOLUME_NAME, DVDI_VOLUME_DRIVER, DVDI_VOLUME_CONTAINERPATH. I did this from 0.27.0 and everything looks to be working fine.

So the question I am moving towards is whether there is some type of race condition introduced in a newer rev of Mesos. The failure of the detach is based on things not being cleaned up appropriately. If there was logic that changed on the cleanup side in 0.28+, then it may be interfering with our ability to properly remove mounts to enable the detach process to complete.

jdef commented 8 years ago

one of the symptoms is that when a container path mount is created /tmp (in our test system) within the container mountns it's propagating back to the hosts mountns. the container mountns is destroyed upon container teardown but the mount still exists in the host mountns. rexray then tries to detach the volume but can't.

our core team tells me this is because without the linux/fs isolator, /tmp exists in a shared mountns. and they also tell me that there is no other valid workaround for this problem -- that a container mountns must be properly isolated to avoid mounts propagating back to the host mountns.

@clintonskitson are you saying that DVD isolator already prevents such leakage?

/cc @jieyu

clintkitson commented 8 years ago

@cantbewong @jdef Having a mountpath leak from a container to the hosts mount doesn't sound right. Is there any documentation to point us to online that discusses this? Can you try this on an Ubuntu 14.04 based host?

jieyu commented 8 years ago

Ubuntu 14.04 does not have this issue because by default, all mounts are private.

Centos7/fedora23/coreos will have this issue because by default, all mounts are shared.

clintkitson commented 8 years ago

@cantbewong Is it possible that when we leverage the fileystem isolator code from Mesos that we aren't invoking it as a private mount? I could see this being the problem if DVDI_VOLUME_CONTAINERPATH is being invoked.

cantbewong commented 8 years ago

Assuming the mount being discussed is the bind mount used to isolate a container:

the mount command is queued as follows prepareInfo.add_commands()->set_value( "mount -n --rbind " + mountPoint + " " + containerPath);

No efforts are made to specify private vs shared so if RedHat and Ubuntu differ as to default, the private vs. shared state of the bind mount will differ.

I found documentation that indicates that the RedHat default is private https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/sect-Using_the_mount_Command-Mounting-Bind.html

The Fedora documentation refers to the kernel documentation which also indicates a default of private https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt

If this is the root cause of this issue, an explicit call to declare the mount private would be useful mount --make-rprivate mount_point

Making this call should be harmless, other than adding the time it takes to execute, so I will put it in the next version of the isolator.

jdef commented 8 years ago

I think @jieyu expressed concerns that not all versions of mount supported the same set of flags. Should probably determine whether the flag you're proposing is available on the OS versions that should be supported

On Tue, Apr 12, 2016 at 7:45 PM, Steve Wong notifications@github.com wrote:

Assuming the mount being discussed is the bind mount used to isolate a container:

the mount command is queued as follows prepareInfo.add_commands()->set_value( "mount -n --rbind " + mountPoint + " " + containerPath);

No efforts are made to specify private vs shared so if RedHat and Ubuntu differ as to default, the private vs. shared state of the bind mount will differ.

I found documentation that indicates that the RedHat default is private

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/sect-Using_the_mount_Command-Mounting-Bind.html

The Fedora documentation refers to the kernel documentation which also indicates a default of private https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt

If this is the root cause of this issue, an explicit call to declare the mount private would be useful mount --make-rprivate mount_point

Making this call should be harmless, other than adding the time it takes to execute, so I will put it in the next version of the isolator.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-209152021

James DeFelice 585.241.9488 (voice) 650.649.6071 (fax)

cantbewong commented 8 years ago

@jdef can you provide a suggestion for a supported OS list?

jieyu commented 8 years ago

@jdef @cantbewong the problem regarding the make-rslave|make-rprivate issue is described here in this Mesos ticket. It does not work properly on ubuntu 14.04 https://issues.apache.org/jira/browse/MESOS-3806

clintkitson commented 8 years ago

The docker solution here is within a mount package they have that calls the kernel via C I believe. Might be worth a look to see about leveraging that as a long term solution versus calling a CLI that may continue to differ between OS dist is over time.

On Thursday, April 14, 2016, Jie Yu notifications@github.com wrote:

@jdef https://github.com/jdef @cantbewong https://github.com/cantbewong the problem regarding the make-rslave|make-rprivate issue is described here in this Mesos ticket. It does not work properly on ubuntu 14.04 https://issues.apache.org/jira/browse/MESOS-3806

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-210088417

uschen commented 7 years ago

Same issue coreos stable (1235.12.0)

Apr 15 00:11:57 10.1.21.50 rexray[1747]: time="2017-04-15T00:11:57Z" level=info msg="waiting for volume detachment to complete" driverName=ec2 force=false moduleName=default-docker runAsync=false volumeID=vol-xxxxx

docker ps etc wont response

thecodeteam / mesos-module-dvdi

hung on unmount operation #88