Open jdef opened 8 years ago
forcibly killing the dvdcli unmount
process appears to unblock things, but is (obviously) not an ideal solution.
... and this appears to confuse the system, which now has mounted two devices to the same point in the filesystem:
/dev/xvda6 /usr/share/oem ext4 rw,nodev,relatime,commit=600,data=ordered 0 0
/dev/xvda1 /boot vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro 0 0
/dev/xvdb /var/lib ext4 rw,relatime,data=ordered 0 0
/dev/xvdc /tmp/test-rexray-volume ext4 rw,relatime,data=ordered 0 0
/dev/xvdd /var/lib/rexray/volumes/jdef-test-vol ext4 rw,relatime,data=ordered 0 0
/dev/xvdd /tmp/test-rexray-volume ext4 rw,relatime,data=ordered 0 0
second volume was created with these specs (should have been a new vol since it was not preexisting):
I0402 20:11:02.853814 1135 docker_volume_driver_isolator.cpp:524] DVDI_VOLUME_CONTAINERPATH(/tmp/test-rexray-volume) parsed from environment
I0402 20:11:02.853844 1135 docker_volume_driver_isolator.cpp:524] DVDI_VOLUME_NAME(jdef-test-vol) parsed from environment
I0402 20:11:02.853869 1135 docker_volume_driver_isolator.cpp:524] DVDI_VOLUME_DRIVER(rexray) parsed from environment
I0402 20:11:02.853893 1135 docker_volume_driver_isolator.cpp:524] DVDI_VOLUME_OPTS(size=100) parsed from environment
rexray config:
core@ip-10-0-0-249 ~ $ cat /etc/rexray/config.yml
rexray:
loglevel: info
storageDrivers:
- ec2
volume:
mount:
preempt: true
unmount:
ignoreusedcount: true
@jdef I believe the 16 Device in use; refusing to close
message is the key here. The mesos-module-dvdi does accounting to determine when it is appropriate to unmount a volume from the system. The purpose here is if there are multiple containers sharing a volume, then it is the modules responsibility to track this and ensure an unmount (host level operation) does not occur until all container are done using the volume.
The message in the bottom of the log is showing that there is something that is still using the device that is trying to be detached. Is there any chance that the device and/or mount path is being accessed by another process on the system? Can you capture the results of lsof
filtering for the mounted paths?
@clintonskitson lsof results:
core@ip-10-0-0-142 ~ $ sudo lsof -V /tmp/test-rexray-volume
lsof: no file system use located: /tmp/test-rexray-volume
core@ip-10-0-0-142 ~ $ cat /proc/mounts|grep -e rexray
/dev/xvdc /tmp/test-rexray-volume ext4 rw,relatime,data=ordered 0 0
core@ip-10-0-0-142 ~ $ sudo lsof -V /dev/xvdc
lsof: no file system use located: /dev/xvdc
core@ip-10-0-0-142 ~ $ ls /tmp/test-rexray-volume
hello
core@ip-10-0-0-142 ~ $ sudo lsof -V /tmp/test-rexray-volume/hello
lsof: no file use located: /tmp/test-rexray-volume/hello
@jdef Can you perform something like lsof -V | grep test-rexray-volume
?
comes back empty
On Mon, Apr 4, 2016 at 4:11 PM, Clinton Kitson notifications@github.com wrote:
@jdef https://github.com/jdef Can you perform something like lsof -V | grep test-rexray-volume?
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205475070
James DeFelice 585.241.9488 (voice) 650.649.6071 (fax)
@clintonskitson i'm going to try with an explicit mount under /etc
to see if that makes a difference
If it is empty, can you try an unmount of the path?
appears to succeed:
core@ip-10-0-0-142 ~ $ sudo umount /tmp/test-rexray-volume core@ip-10-0-0-142 ~ $ cat /proc/mounts |grep -e rexray core@ip-10-0-0-142 ~ $
On Mon, Apr 4, 2016 at 4:14 PM, Clinton Kitson notifications@github.com wrote:
If it is empty, can you try an unmount of the path?
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205475876
James DeFelice 585.241.9488 (voice) 650.649.6071 (fax)
Can you replicate the problem where it hung before? It would have hung because there were open files. Technically, since the container that has the volume mounted and is using the files is done, there should not be any files open. I am curious to see if you could do a lsof
during this time to see why the mount process is hanging.
well, i just tried to launch a new task with a new volume instance but rexray is hanging on trying to mount it. this is the same slave that I just manually ran the umount
command on.
Apr 04 20:23:25 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:25Z" level=info msg=vdm.Mount driverName=docker moduleName=default
-docker newFsType= overwriteFs=false preempt=false volumeID= volumeName=jdef-test-vol13115
Apr 04 20:23:25 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:25Z" level=info msg="mounting volume" driverName=docker moduleName
=default-docker newFsType= overwriteFs=false volumeID= volumeName=jdef-test-vol13115
Apr 04 20:23:25 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:25Z" level=info msg=sdm.GetVolume driverName=ec2 moduleName=defaul
t-docker volumeID= volumeName=jdef-test-vol13115
Apr 04 20:23:27 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:26Z" level=info msg=sdm.GetVolumeAttach driverName=ec2 instanceID=
i-25bff2fd moduleName=default-docker volumeID=vol-9be4fb28
Apr 04 20:23:27 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:27Z" level=info msg=odm.Unmount driverName=linux moduleName=defaul
t-docker mountPoint="//var/lib/rexray/volumes/jdef-test-vol13115"
Apr 04 20:23:27 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:27Z" level=info msg=sdm.AttachVolume driverName=ec2 force=true ins
tanceID=i-25bff2fd moduleName=default-docker runAsync=false volumeID=vol-9be4fb28
Apr 04 20:23:27 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:27Z" level=info msg="got next device name" driverName=ec2 moduleNa
me=default-docker nextDeviceName="/dev/xvdc"
Apr 04 20:23:27 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:27Z" level=info msg="waiting for volume attachment to complete" dr
iverName=ec2 force=true instanceID=i-25bff2fd moduleName=default-docker runAsync=false volumeID=vol-9be4fb28
the error I produced today was on a completely new cluster vs the error I first reported. so it can be reproduced. the containers i'm spinning up aren't touching any files in the volume, so i'm not sure what would be holding a handle to any files on it.
@jdef Waiting for volume attachment to complete looks to be a EC2 based problem and might align to a bug they have in their Ubuntu OS image with the EBS driver. Can you reboot the instances and try again?
FWIW this is coreos, not Ubuntu. I've rebooted the slave. Once rebooted, the task launched just fine (tail -f /dev/null
).
lsof -V
did not show anything accessing the volume filesystem
i killed the task. and dvdcli unmount
is hanging again trying to unmount the volume. running lsof -V
still shows that no one is accessing the volume filesystem
core@ip-10-0-0-142 ~ $ uname -a
Linux ip-10-0-0-142.us-west-2.compute.internal 4.1.7-coreos-r1 #2 SMP Thu Nov 5 02:10:23 UTC 2015 x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz GenuineIntel GNU/Linux
to be clear, once dvdcli unmount
hangs further task launches (using ext vols, or not) are all blocked in TASK_STAGING and eventually time out.
What is the coreos rev or AMI? I will try and reproduce.
On Monday, April 4, 2016, James DeFelice notifications@github.com wrote:
to be clear, once dvdcli unmount hangs further task launches (using ext vols, or not) are all blocked in TASK_STAGING and eventually time out.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205510246
looks like this AMI CoreOS-stable-766.5.0-hvm
Can you review the devices that show up under /dev/disk/by-id?
Curious if something is claiming for block devices as they appear.
On Monday, April 4, 2016, James DeFelice notifications@github.com wrote:
looks like this AMI CoreOS-stable-766.5.0-hvm
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205624383
core@ip-10-0-0-142 ~ $ ls -laF /dev/disk/by-uuid/ total 0 drwxr-xr-x 2 root root 160 Apr 4 21:11 ./ drwxr-xr-x 7 root root 140 Apr 4 20:38 ../ lrwxrwxrwx 1 root root 11 Apr 4 20:38 01d6bc7e-0ab0-49f0-9e24-23ff5fc001c9 -> ../../xvda9 lrwxrwxrwx 1 root root 11 Apr 4 20:38 0321af0d-eff8-4f69-9f05-9f9f7bd3c7cf -> ../../xvda3 lrwxrwxrwx 1 root root 10 Apr 4 20:38 15797eed-93bd-43ab-a7ad-19e7e1fa0b31 -> ../../xvdb lrwxrwxrwx 1 root root 10 Apr 4 21:11 6e2bac7e-8f12-428b-9a4a-38b599573190 -> ../../xvdc lrwxrwxrwx 1 root root 11 Apr 4 20:38 CBC4-C7B4 -> ../../xvda1 lrwxrwxrwx 1 root root 11 Apr 4 20:38 ba2d8241-3460-4b9c-88d1-ffc493349f66 -> ../../xvda6
On Mon, Apr 4, 2016 at 11:34 PM, Clinton Kitson notifications@github.com wrote:
Can you review the devices that show up under /dev/disk/by-id?
Curious if something is claiming for block devices as they appear.
On Monday, April 4, 2016, James DeFelice notifications@github.com wrote:
looks like this AMI CoreOS-stable-766.5.0-hvm
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub < https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205624383
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205624793
James DeFelice 585.241.9488 (voice) 650.649.6071 (fax)
Ok looks normal, thanks.
On Monday, April 4, 2016, James DeFelice notifications@github.com wrote:
core@ip-10-0-0-142 ~ $ ls -laF /dev/disk/by-uuid/ total 0 drwxr-xr-x 2 root root 160 Apr 4 21:11 ./ drwxr-xr-x 7 root root 140 Apr 4 20:38 ../ lrwxrwxrwx 1 root root 11 Apr 4 20:38 01d6bc7e-0ab0-49f0-9e24-23ff5fc001c9 -> ../../xvda9 lrwxrwxrwx 1 root root 11 Apr 4 20:38 0321af0d-eff8-4f69-9f05-9f9f7bd3c7cf -> ../../xvda3 lrwxrwxrwx 1 root root 10 Apr 4 20:38 15797eed-93bd-43ab-a7ad-19e7e1fa0b31 -> ../../xvdb lrwxrwxrwx 1 root root 10 Apr 4 21:11 6e2bac7e-8f12-428b-9a4a-38b599573190 -> ../../xvdc lrwxrwxrwx 1 root root 11 Apr 4 20:38 CBC4-C7B4 -> ../../xvda1 lrwxrwxrwx 1 root root 11 Apr 4 20:38 ba2d8241-3460-4b9c-88d1-ffc493349f66 -> ../../xvda6
On Mon, Apr 4, 2016 at 11:34 PM, Clinton Kitson <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:
Can you review the devices that show up under /dev/disk/by-id?
Curious if something is claiming for block devices as they appear.
On Monday, April 4, 2016, James DeFelice <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:
looks like this AMI CoreOS-stable-766.5.0-hvm
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub <
https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205624383
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub < https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205624793
James DeFelice 585.241.9488 (voice) 650.649.6071 (fax)
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-205625914
so, it actually doesn't block indefinitely... the job I tried to start about 6 hours ago finally started about 20m ago. slave logs have this to say:
E0405 03:18:47.355356 3293 shell.hpp:93] Command '/opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --volumename=jdef-test-vol13115 ' failed; this is the output:
W0405 03:18:47.355484 3293 docker_volume_driver_isolator.cpp:386] /opt/mesosphere/bin/dvdcli unmount failed to execute on cleanup(), continuing on the assumption this volume was manually unmounted previously Failed to execute '/opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --volumename=jdef-test-vol13115 '; the command was either not found or exited with a non-zero exit status: 1
I0405 03:18:47.355914 3293 containerizer.cpp:666] Starting container 'bf8b669c-9623-43c2-9345-bf895f77059f' for executor 'adasdasdasdasd.e6874945-faad-11e5-bf0b-161c5a4ef132' of framework '15c5b4d7-9ae8-40f6-91a5-4bf40cb8c791-0000'
I0405 03:18:47.357314 3293 containerizer.cpp:1392] Destroying container 'bf8b669c-9623-43c2-9345-bf895f77059f'
immediately after this log output, my (formerly) blocked non-volume-using task was finally launched (actually it experienced a TASK_FAILED because it had started so long ago and marathon had since killed it, but ... it finally caught up to the world. marathon was then able to launch it again and that worked).
rexray had something to say about the same time:
Apr 05 03:18:47 ip-10-0-0-142.us-west-2.compute.internal rexray[5498]: time="2016-04-05T03:18:47Z" level=error msg="/VolumeDriver.Unmount: error unmounting volume" error="AWS was not able to validate the provided
access credentials (AuthFailure)"
Apr 05 03:18:47 ip-10-0-0-142.us-west-2.compute.internal mesos-slave[3175]: time="2016-04-05T03:18:47Z" level=error msg="Plugin Error: VolumeDriver.Unmount, {\"Error\":\"AWS was not able to validate the provided a
ccess credentials (AuthFailure)\"}\n"
Apr 05 03:18:47 ip-10-0-0-142.us-west-2.compute.internal rexray[5498]: time="2016-04-05T03:18:47Z" level=error msg="AWS was not able to validate the provided access credentials (AuthFailure)"
Apr 05 03:18:47 ip-10-0-0-142.us-west-2.compute.internal mesos-slave[3175]: E0405 03:18:47.355356 3293 shell.hpp:93] Command '/opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --volumename=jdef-test-vol131
15 ' failed; this is the output:
Apr 05 03:18:47 ip-10-0-0-142.us-west-2.compute.internal mesos-slave[3175]: W0405 03:18:47.355484 3293 docker_volume_driver_isolator.cpp:386] /opt/mesosphere/bin/dvdcli unmount failed to execute on cleanup(), con
tinuing on the assumption this volume was manually unmounted previously Failed to execute '/opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --volumename=jdef-test-vol13115 '; the command was either not fou
nd or exited with a non-zero exit status: 1
@jdef With a fresh system can you try and leverage the dvdcli
commands manually and check the results?
Please replace rexray
and test
with appropriate names for your config. This should simulate the most basic usage.
/opt/bin/dvdcli mount --volumedriver=rexray --volumename=test
/opt/bin/dvdcli unmount --volumedriver=rexray --volumename=test
@jdef Can you also paste in the marathon application definition that you are using?
Apparently we're running a forked version of rexray that tries to use IAM credentials instead of credentials coded into the configuration file. Upon further review it seems likely that the library code responsible for EC2 auth is failing to refresh credentials upon token expiration. That said, it's not clear to me that the credentials issue is directly related to hanging unmount commands, especially when I terminate a task so closely to when it's created -- given how easy it is to reproduce the issue, I'm betting that the credentials have not yet expired.
The marathon spec I'm using depends on a development branch of marathon that's adding higher level support for external volumes (so this is subject to change prior to being merged into master):
{
"id": "hello",
"instances": 1,
"cpus": 0.1,
"mem": 32,
"cmd": "/usr/bin/tail -f /dev/null",
"container": {
"type": "MESOS",
"volumes": [
{
"containerPath": "/tmp/test-rexray-volume",
"persistent": {
"size": 100,
"name": "jdef-test-vol13115",
"provider": "external",
"options": { "external/driver": "rexray" }
},
"mode": "RW"
}
]
},
"upgradeStrategy": {
"minimumHealthCapacity": 0,
"maximumOverCapacity": 0
}
}
To test the mount/unmount commands manually I'll need to spin up a fresh cluster and execute the commands before the initially detected IAM creds expire.
when I manually run the dvdcli mount/unmount command they work just fine with the IAM fixes @branden incorporated. bumped into #89 and that's stalling further testing.
hm... IAM problems have been resolved and the hung unmount problem persists:
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.814273 1542 slave.cpp:3528] executor(1)@10.0.1.57:47184 exited
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.910516 1547 containerizer.cpp:1608] Executor for container '903b69ea-9656-44dc-bae3-a106bc954f9f' has exited
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.910591 1547 containerizer.cpp:1392] Destroying container '903b69ea-9656-44dc-bae3-a106bc954f9f'
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.911882 1541 cgroups.cpp:2427] Freezing cgroup /sys/fs/cgroup/freezer/mesos/903b69ea-9656-44dc-bae3-a106bc954f9f
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.912988 1547 cgroups.cpp:1409] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/903b69ea-9656-44dc-bae3-a106bc954f9f
after 1.054976ms
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.914213 1542 cgroups.cpp:2445] Thawing cgroup /sys/fs/cgroup/freezer/mesos/903b69ea-9656-44dc-bae3-a106bc954f9f
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.915292 1547 cgroups.cpp:1438] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/903b69ea-9656-44dc-bae3-a106bc954f9
f after 1.035008ms
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.915989 1545 docker_volume_driver_isolator.cpp:357] rexray/jdef-test-vol6249824 is being unmounted on cleanup()
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.917794 1545 docker_volume_driver_isolator.cpp:363] Invoking /opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --vol
umename=jdef-test-vol6249824
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:31Z" level=info msg=vdm.Create driverName=docker moduleName=default-docker opts=map[] volumeName=jdef-test-vol6249824
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:31Z" level=info msg="initialized count" count=0 moduleName=default-docker volumeName=jdef-test-vol6249824
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:31Z" level=info msg="creating volume" driverName=docker moduleName=default-docker volumeName=jdef-test-vol6249824 volumeO
pts=map[]
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg=sdm.GetVolume driverName=ec2 moduleName=default-docker volumeID= volumeName=jdef-test-vol6249824
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg=vdm.Unmount driverName=docker moduleName=default-docker volumeID= volumeName=jdef-test-vol6249824
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg="initialized count" count=0 moduleName=default-docker volumeName=jdef-test-vol6249824
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg="unmounting volume" driverName=docker moduleName=default-docker volumeID= volumeName=jdef-test-vol624
9824
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg=sdm.GetVolume driverName=ec2 moduleName=default-docker volumeID= volumeName=jdef-test-vol6249824
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg=sdm.GetVolumeAttach driverName=ec2 instanceID=i-3e6f7ae4 moduleName=default-docker volumeID=vol-9482e
22d
Apr 07 02:59:33 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:33Z" level=info msg=odm.GetMounts deviceName="/dev/xvdc" driverName=linux moduleName=default-docker mountPoint=
Apr 07 02:59:33 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:33Z" level=info msg=odm.Unmount driverName=linux moduleName=default-docker mountPoint="/var/lib/rexray/volumes/jdef-test-
vol6249824"
Apr 07 02:59:33 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:33Z" level=info msg=sdm.DetachVolume driverName=ec2 instanceID= moduleName=default-docker runAsync=false volumeID=vol-948
2e22d
Apr 07 02:59:33 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:33Z" level=info msg="waiting for volume detachment to complete" driverName=ec2 force=false moduleName=default-docker runA
sync=false volumeID=vol-9482e22d
Apr 07 02:59:33 ip-10-0-1-57.us-west-2.compute.internal kernel: vbd vbd-51744: 16 Device in use; refusing to close
using filesystem/linux isolator in conjunction with this isolator fixes the problem. without the filesystem/linux isolator the host mountns is contaminated with the mounts added to the container. core team says that the hosts /tmp was being polluted by the container w/o the filesystem/linux isolator
I've left this issue open because this dependency on the linux/filesystem isolator should be documented
@jdef Doing some more testing on our side. To me this doesn't make sense that this would be required.
@jdef So the troubling thing here is that I believe I simulated relevant options by supplying DVDI_VOLUME_NAME, DVDI_VOLUME_DRIVER, DVDI_VOLUME_CONTAINERPATH
. I did this from 0.27.0 and everything looks to be working fine.
So the question I am moving towards is whether there is some type of race condition introduced in a newer rev of Mesos. The failure of the detach is based on things not being cleaned up appropriately. If there was logic that changed on the cleanup side in 0.28+, then it may be interfering with our ability to properly remove mounts to enable the detach process to complete.
one of the symptoms is that when a container path mount is created /tmp (in our test system) within the container mountns it's propagating back to the hosts mountns. the container mountns is destroyed upon container teardown but the mount still exists in the host mountns. rexray then tries to detach the volume but can't.
our core team tells me this is because without the linux/fs isolator, /tmp exists in a shared mountns. and they also tell me that there is no other valid workaround for this problem -- that a container mountns must be properly isolated to avoid mounts propagating back to the host mountns.
@clintonskitson are you saying that DVD isolator already prevents such leakage?
/cc @jieyu
@cantbewong @jdef Having a mountpath leak from a container to the hosts mount doesn't sound right. Is there any documentation to point us to online that discusses this? Can you try this on an Ubuntu 14.04 based host?
Ubuntu 14.04 does not have this issue because by default, all mounts are private.
Centos7/fedora23/coreos will have this issue because by default, all mounts are shared.
@cantbewong Is it possible that when we leverage the fileystem isolator code from Mesos that we aren't invoking it as a private mount? I could see this being the problem if DVDI_VOLUME_CONTAINERPATH
is being invoked.
Assuming the mount being discussed is the bind mount used to isolate a container:
the mount command is queued as follows prepareInfo.add_commands()->set_value( "mount -n --rbind " + mountPoint + " " + containerPath);
No efforts are made to specify private vs shared so if RedHat and Ubuntu differ as to default, the private vs. shared state of the bind mount will differ.
I found documentation that indicates that the RedHat default is private https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/sect-Using_the_mount_Command-Mounting-Bind.html
The Fedora documentation refers to the kernel documentation which also indicates a default of private https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt
If this is the root cause of this issue, an explicit call to declare the mount private would be useful mount --make-rprivate mount_point
Making this call should be harmless, other than adding the time it takes to execute, so I will put it in the next version of the isolator.
I think @jieyu expressed concerns that not all versions of mount
supported the same set of flags. Should probably determine whether the flag
you're proposing is available on the OS versions that should be supported
On Tue, Apr 12, 2016 at 7:45 PM, Steve Wong notifications@github.com wrote:
Assuming the mount being discussed is the bind mount used to isolate a container:
the mount command is queued as follows prepareInfo.add_commands()->set_value( "mount -n --rbind " + mountPoint + " " + containerPath);
No efforts are made to specify private vs shared so if RedHat and Ubuntu differ as to default, the private vs. shared state of the bind mount will differ.
I found documentation that indicates that the RedHat default is private
The Fedora documentation refers to the kernel documentation which also indicates a default of private https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt
If this is the root cause of this issue, an explicit call to declare the mount private would be useful mount --make-rprivate mount_point
Making this call should be harmless, other than adding the time it takes to execute, so I will put it in the next version of the isolator.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-209152021
James DeFelice 585.241.9488 (voice) 650.649.6071 (fax)
@jdef can you provide a suggestion for a supported OS list?
@jdef @cantbewong the problem regarding the make-rslave|make-rprivate issue is described here in this Mesos ticket. It does not work properly on ubuntu 14.04 https://issues.apache.org/jira/browse/MESOS-3806
The docker solution here is within a mount package they have that calls the kernel via C I believe. Might be worth a look to see about leveraging that as a long term solution versus calling a CLI that may continue to differ between OS dist is over time.
On Thursday, April 14, 2016, Jie Yu notifications@github.com wrote:
@jdef https://github.com/jdef @cantbewong https://github.com/cantbewong the problem regarding the make-rslave|make-rprivate issue is described here in this Mesos ticket. It does not work properly on ubuntu 14.04 https://issues.apache.org/jira/browse/MESOS-3806
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/88#issuecomment-210088417
Same issue coreos stable (1235.12.0)
Apr 15 00:11:57 10.1.21.50 rexray[1747]: time="2017-04-15T00:11:57Z" level=info msg="waiting for volume detachment to complete" driverName=ec2 force=false moduleName=default-docker runAsync=false volumeID=vol-xxxxx
docker ps etc wont response
I created a task in Marathon, watched it come up, let it run for a few min, then suspended the app (sets instances to zero). I ended up with a hung unmount op on the slave the task was running on. The task appears as KILLED in mesos. I have a single slave node cluster. Now I can't launch a new task trying to use a (different) rexray vol. Attempting to do so results in a task stuck in STAGING. So this hung unmount op is blocking somewhere.
logs:
/proc/mounts: