mesosphere-backup / deimos

Mesos containerizer hooks for Docker
Apache License 2.0
249 stars 26 forks source link

Container flaps between 'Staging' and 'Running' => can't download a package #47

Open tnolet opened 10 years ago

tnolet commented 10 years ago

Hi,

starting a simple container using Marathon/Deimos fails because for some reason it is fails on not being able to fetch a .jar file hosted on S3. The docker image is download correctly by a slave from the public docker repo, and can be run manually with no problems. The app inside the container is a simple 'hello-world' type java app.

Details:

mesos: 0.19.1 deimos: 0.4.0 marathon: 0.6.0-1.0 ubuntu: 14.04 trusty

docker image: tnolet/hello1 Dockerfile:

FROM ubuntu:latest

MAINTAINER Tim Nolet

RUN apt-get update -y

RUN apt-get install -y --no-install-recommends openjdk-7-jre

ENV JAVA_HOME /usr/lib/jvm/java-7-openjdk-amd64

RUN apt-get install -y curl

RUN curl -sf -O https://s3-eu-west-1.amazonaws.com/deploy.magnetic.io/snapshots/dropwizard-0.0.1-SNAPSHOT.jar

RUN curl -sf -O https://s3-eu-west-1.amazonaws.com/deploy.magnetic.io/snapshots/hello-world.yml

EXPOSE 8080

EXPOSE 8081

ENV SERVICE hello:0.0.1:8080:8081

CMD java -jar dropwizard-0.0.1-SNAPSHOT.jar server hello-world.yml

task file:

{
    "container": {
    "image": "docker:///tnolet/hello1",
    "options" : []
  },
  "id": "hello1",
  "instances": "1",
  "cpus": ".5",
  "mem": "512",
  "uris": [],
  "cmd": ""
}

Error in stderr in mesos gui: Error: Unable to access jarfile dropwizard-0.0.1-SNAPSHOT.jar

output from mesos.slave-INFO on slave:

I0731 13:09:21.673143  8814 slave.cpp:1664] Got registration for executor 'hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' of framework 20140731-110416-606019500-5050-1090-0000
I0731 13:09:21.673703  8814 slave.cpp:1783] Flushing queued task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab for executor 'hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' of framewor
k 20140731-110416-606019500-5050-1090-0000
I0731 13:09:21.695307  8814 slave.cpp:2018] Handling status update TASK_RUNNING (UUID: 4e704272-eecd-4205-819c-a2eb63048c18) for task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab o
f framework 20140731-110416-606019500-5050-1090-0000 from executor(1)@172.31.31.38:49678
I0731 13:09:21.695582  8814 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 4e704272-eecd-4205-819c-a2eb63048c18) for task hello1.e0be7ca8-18b3-11e4-a08d
-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000
I0731 13:09:21.695897  8814 status_update_manager.cpp:373] Forwarding status update TASK_RUNNING (UUID: 4e704272-eecd-4205-819c-a2eb63048c18) for task hello1.e0be7ca8-18b3-11e4-a0
8d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 to master@172.31.31.36:5050
I0731 13:09:21.696854  8815 slave.cpp:2145] Sending acknowledgement for status update TASK_RUNNING (UUID: 4e704272-eecd-4205-819c-a2eb63048c18) for task hello1.e0be7ca8-18b3-11e4-
a08d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 to executor(1)@172.31.31.38:49678
I0731 13:09:21.702631  8812 status_update_manager.cpp:398] Received status update acknowledgement (UUID: 4e704272-eecd-4205-819c-a2eb63048c18) for task hello1.e0be7ca8-18b3-11e4-a
08d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000
I0731 13:09:21.859962  8816 slave.cpp:2355] Monitoring executor 'hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' of framework '20140731-110416-606019500-5050-1090-0000' in container 
'e4c7ca90-0dff-4492-b3c4-e6c7569f1eeb'
I0731 13:09:22.687067  8813 slave.cpp:2018] Handling status update TASK_FAILED (UUID: 2d405edf-32a0-493e-aea0-d2d8d4cc1f9c) for task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab of
 framework 20140731-110416-606019500-5050-1090-0000 from executor(1)@172.31.31.38:49678
I0731 13:09:22.698246  8811 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 2d405edf-32a0-493e-aea0-d2d8d4cc1f9c) for task hello1.e0be7ca8-18b3-11e4-a08d-
0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000
I0731 13:09:22.699434  8811 status_update_manager.cpp:373] Forwarding status update TASK_FAILED (UUID: 2d405edf-32a0-493e-aea0-d2d8d4cc1f9c) for task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 to master@172.31.31.36:5050
I0731 13:09:22.700186  8811 slave.cpp:2145] Sending acknowledgement for status update TASK_FAILED (UUID: 2d405edf-32a0-493e-aea0-d2d8d4cc1f9c) for task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 to executor(1)@172.31.31.38:49678
I0731 13:09:22.709666  8815 status_update_manager.cpp:398] Received status update acknowledgement (UUID: 2d405edf-32a0-493e-aea0-d2d8d4cc1f9c) for task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000
I0731 13:09:23.060930  8814 slave.cpp:933] Got assigned task hello1.e3b9e259-18b3-11e4-a08d-0a4559673eab for framework 20140731-110416-606019500-5050-1090-0000
I0731 13:09:23.061293  8814 slave.cpp:1043] Launching task hello1.e3b9e259-18b3-11e4-a08d-0a4559673eab for framework 20140731-110416-606019500-5050-1090-0000
I0731 13:09:23.063863  8815 external_containerizer.cpp:433] Launching container 'cfb86a26-2821-49ce-95a0-3e4d0dfd8657'
I0731 13:09:23.080337  8814 slave.cpp:1153] Queuing task 'hello1.e3b9e259-18b3-11e4-a08d-0a4559673eab' for executor hello1.e3b9e259-18b3-11e4-a08d-0a4559673eab of framework '20140731-110416-606019500-5050-1090-0000
E0731 13:09:23.859387  8811 slave.cpp:2397] Termination of executor 'hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' of framework '20140731-110416-606019500-5050-1090-0000' failed: External containerizer failed (status: 1)
I0731 13:09:23.859632  8811 slave.cpp:2552] Cleaning up executor 'hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' of framework 20140731-110416-606019500-5050-1090-0000
I0731 13:09:23.860239  8811 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20140731-110416-606019500-5050-1090-2/frameworks/20140731-110416-606019500-5050-1090-0000/executors/hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab/runs/e4c7ca90-0dff-4492-b3c4-e6c7569f1eeb' for gc 6.99999004926815days in the future
I0731 13:09:23.860345  8811 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20140731-110416-606019500-5050-1090-2/frameworks/20140731-110416-606019500-5050-1090-0000/executors/hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' for gc 6.99999004838222days in the future
I0731 13:09:23.871316  8816 external_containerizer.cpp:1040] Killed the following process tree/s:
[ 

]

Again. running the following command on the slave manually starts up the container with no problems: sudo docker run -d -P tnolet/hello1

tnolet commented 10 years ago

I found the cause for this behaviour. This flapping happens when artifacts or executables inside docker containers are not referenced by their full path name. Because deimos adds the -w /tmp/mesos-sandbox switch for the working directory in Docker, all paths are off...

Not sure if this is a bug or just something people should be aware of.

tnolet commented 10 years ago

This is similar to https://github.com/mesosphere/deimos/issues/49

solidsnack commented 10 years ago

I'm just not sure what the right thing to do is. Deimos puts URLs from the Mesos task in a directory which it mounts at /tmp/mesos-sandbox so tasks can find the downloaded contents. It seems reasonable to set the working directory to that directory, too, so that frameworks which are unaware of Docker will still find the URLs they expect.

There is a patch under #49 to acknowledge the WORKDIR directive but I do wonder if there is a better policy in general.

Having Deimos dump the URLs in the "right place" could perhaps be accomplished by:

Hopefully ENTRYPOINT and CMD and all that would be preserved in the new image.

tnolet commented 10 years ago

I see the problem. I guess if everyone is fully aware that this is happening, there is not a big problem. Making your paths and urls fully qualified isn't always a nice way of handling things, but there are ways around it and in the end it's not a biggie.