thecodeteam / mesos-module-dvdi

Mesos Docker Volume Driver Isolator module
Apache License 2.0
77 stars 16 forks source link

isolator should invoke potentially blocking operations async from module API handlers #92

Open jdef opened 8 years ago

jdef commented 8 years ago

related to #88, if calls to os::shell to execute dvdcli hang or block for significant amounts of time then the task launch pipeline breaks down and tasks become stuck in STAGING. part of the reason why this happens is because the isolator module invokes potentially blocking operations synchronously from within the mesos module API handlers.

a better approach would be to invoke such commands asynchronously. perhaps by using, for example, Subprocess. HDFS code in Mesos provides an example of this approach: https://github.com/apache/mesos/blob/4d2b1b793e07a9c90b984ca330a3d7bc9e1404cc/src/hdfs/hdfs.cpp#L53

cantbewong commented 8 years ago

I looked at the Marathon code and I agree that this is a good idea and should be feasible. Thanks for the input.

jieyu commented 8 years ago

To add to @jdef 's description, this problem is pretty severe. If any operation in dvdi module blocks, ALL subsequent container launch/update/destroy will be BLOCKED, irrespective of whether the container is using external volume or not.

Fixing that might involve serializing dvdcli operations. This is because when you use Subprocess, the order in which dvdcli operations are executed is non-deterministic. For instance, say you have a volume you want to umount first and then a new container coming requesting the same volume. You expect that the volume will be mounted for the new container. However, due to the race, it's likely that the umount happens later than the mount.

dvonthenen commented 8 years ago

@jdef, Just for my understanding, what happens in the case for docker type workloads/containers? The specific case I am thinking about is if we mount the volume async, come out of staging state, and the application comes up without the volume data being available, the application might error out from the data not being there.

Maybe I am misunderstanding how to use subprocess and what its capabilities are.

jdef commented 8 years ago

@jieyu how did we handle this scenario with the docker volume isolator recently added to mesos?

On Wed, May 18, 2016 at 11:21 AM, David vonThenen notifications@github.com wrote:

@jdef https://github.com/jdef, Just for my understanding, what happens in the case for docker type workloads/containers? The specific case I am thinking about is if we mount the volume async, come out of staging state, and the application comes up without the volume data being available, the application might error out from the data not being there.

Maybe I am misunderstanding how to use subprocess and what its capabilities are.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/emccode/mesos-module-dvdi/issues/92#issuecomment-220061269

James DeFelice 585.241.9488 (voice) 650.649.6071 (fax)