Closed justinclayton closed 8 years ago
I will take a look at this.
Hi @justinclayton, I have been testing the binary in the 0.4.3 release and I haven't run into an issues. I am using a minimal configuration to eliminate any possible conflicts between the 0.28.1 and 0.28.2 versions. I have tried running the agent in both service and command line modes and everything seems to work ok.
This is my minimal configuration:
nohup /usr/sbin/mesos-slave \
--master=zk://<replace with your zookeeper config>/mesos \
--containerizers=docker,mesos --work_dir=/tmp/mesos \
--modules=file:///usr/lib/dvdi-mod.json \
--isolation="com_emccode_mesos_DockerVolumeDriverIsolator" &
Would it be possible to stop all your agents and run one agent on one of your agent nodes in command line mode (above) and see if this works for you? If that does work, continue running in command line mode but add back your other command line options until we hit an issue. Running in command line mode will also create a nohup.out file in your current working directory that can easily capture the crash. I have a feeling that the issue is cause with a behavior change in mesos in one of the command line options you are using.
I have been looking at diffs between 0.28.1 and 0.28.2 and it looks like there was significant changes specifically in the mesos linux filesystem isolator. Continuing to look into this.
I think I see the issue, but I need to verify it. If this is what I think the issue is, I don't see how the 0.28.2 binary is working on your 0.28.1 configuration. It should also be failing there as well with the same issue providing the configuration flags on your 0.28.1 and 0.28.2 are the same.
@justinclayton I have a test binary that I would like you to take a look at. I believe this should fix the issue. I think the issue stems from one of two scenarios:
I believe that changes to the linux filesystem isolator and the order in which the collection of isolators are being called is causing the working directory not to have the necessary checkpoint data.
@dvonthenen Your test binary worked perfectly. Thanks!
And just to be clear: When I said earlier that 0.28.1 did work, I meant dvdi 0.4.2-0.28.1 running on mesos 0.28.1. I never tried to mix and match them.
@justinclayton thanks again for your help. I will be spinning up another release containing this fix. Will close this issue out when the release is available.
Release 0.4.4 has been published with this fix
https://github.com/emccode/mesos-module-dvdi/releases/tag/v0.4.4
Command used: