Open jdef opened 8 years ago
I looked at the Marathon code and I agree that this is a good idea and should be feasible. Thanks for the input.
To add to @jdef 's description, this problem is pretty severe. If any operation in dvdi module blocks, ALL subsequent container launch/update/destroy will be BLOCKED, irrespective of whether the container is using external volume or not.
Fixing that might involve serializing dvdcli operations. This is because when you use Subprocess, the order in which dvdcli operations are executed is non-deterministic. For instance, say you have a volume you want to umount first and then a new container coming requesting the same volume. You expect that the volume will be mounted for the new container. However, due to the race, it's likely that the umount happens later than the mount.
@jdef, Just for my understanding, what happens in the case for docker type workloads/containers? The specific case I am thinking about is if we mount the volume async, come out of staging state, and the application comes up without the volume data being available, the application might error out from the data not being there.
Maybe I am misunderstanding how to use subprocess and what its capabilities are.
@jieyu how did we handle this scenario with the docker volume isolator recently added to mesos?
On Wed, May 18, 2016 at 11:21 AM, David vonThenen notifications@github.com wrote:
@jdef https://github.com/jdef, Just for my understanding, what happens in the case for docker type workloads/containers? The specific case I am thinking about is if we mount the volume async, come out of staging state, and the application comes up without the volume data being available, the application might error out from the data not being there.
Maybe I am misunderstanding how to use subprocess and what its capabilities are.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/92#issuecomment-220061269
James DeFelice 585.241.9488 (voice) 650.649.6071 (fax)
related to #88, if calls to
os::shell
to executedvdcli
hang or block for significant amounts of time then the task launch pipeline breaks down and tasks become stuck inSTAGING
. part of the reason why this happens is because the isolator module invokes potentially blocking operations synchronously from within the mesos module API handlers.a better approach would be to invoke such commands asynchronously. perhaps by using, for example, Subprocess. HDFS code in Mesos provides an example of this approach: https://github.com/apache/mesos/blob/4d2b1b793e07a9c90b984ca330a3d7bc9e1404cc/src/hdfs/hdfs.cpp#L53