mesosphere / marathon

Deploy and manage containers (including Docker) on top of Apache Mesos at scale.
https://mesosphere.github.io/marathon/
Apache License 2.0
4.07k stars 845 forks source link

Marathon/Mesos does not pull docker images. #1161

Closed more-free closed 9 years ago

more-free commented 9 years ago

Hi,

Recently we are using Marathon / Mesos to deploy some of our own docker containers (from our own registry), but it fails to pull the images from registry -- it always fails deploying (from Mesos UI we see failed tasks) unless we manually run "docker pull" on slave nodes to pull images in advance.

It looks to me , seems Mesos fails to pull images ? Any clue about this ? Or is there any log I can check ? Thanks !

The versions we use are : Marathon 0.8.0 Mesos 0.21

and we use json to deploy with Marathon.

drexin commented 9 years ago

Have you increased the executor registration timeout on the mesos slaves? It seems like pulling the images takes too long. If that doesn't help, could you please post the relevant log lines from the mesos slave?

On 09.02.2015, at 21:41, Ke Xu notifications@github.com wrote:

Hi,

Recently we are using Marathon / Mesos to deploy some of our own docker containers (from our own registry), but it fails to pull the images from registry -- it always fails deploying (from Mesos UI we see failed tasks) unless we manually run "docker pull" on slave nodes to pull images in advance.

It looks to me , seems Mesos fails to pull images ? Any clue about this ? Or is there any log I can check ? Thanks !

The versions we use are : Marathon 0.8.0 Mesos 0.21

and we use json to deploy with Marathon.

— Reply to this email directly or view it on GitHub.

more-free commented 9 years ago

edited

drexin commented 9 years ago

You shouldn't run docker from the CMD, Mesos and Marathon come with built in support for Docker. Please see https://mesosphere.github.io/marathon/docs/native-docker.html on how to run Docker on Marathon.

On 09.02.2015, at 23:59, Ke Xu notifications@github.com wrote:

Hi,

thanks very much for your response ! Actually we don't have our own executor, we use the default "CMD" to run docker. From the Mesos UI, we saw that the deployment failed immediately. Also, the log on mesos-slave shows that it fails to pull docker image. But actually our docker image is there and there is no firewall / authentication / etc. to prevent mesos-slave from accessing our docker registry (and we can manually pull the docker image on mesos-slave with exactly the same command shown in the log).

I paste some of the log on mesos-slave below, please help if you have any clue about this issue... Thanks very much !!

I0209 20:24:18.364290 823 gc.cpp:84] Unscheduling '/var/run/mesos/slaves/20150207-012846-1912885258-5050-1164-S8/frameworks/20141216-070346-1912885258-5050-32095-0000' from gc I0209 20:24:18.364780 823 gc.cpp:84] Unscheduling '/var/run/mesos/meta/slaves/20150207-012846-1912885258-5050-1164-S8/frameworks/20141216-070346-1912885258-5050-32095-0000' from gc I0209 20:24:18.365309 823 slave.cpp:1193] Launching task s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799 for framework 20141216-070346-1912885258-5050-32095-0000 I0209 20:24:18.368279 823 slave.cpp:3997] Launching executor s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799 of framework 20141216-070346-1912885258-5050-32095-0000 in work directory '/var/run/mesos/slaves/20150207-012846-1912885258-5050-1164-S8/frameworks/20141216-070346-1912885258-5050-32095-0000/executors/ s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799/runs/42cc4e7f-99b9-448e-84dc-0259f0ab867b' I0209 20:24:18.372236 823 slave.cpp:1316] Queuing task ' s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799' for executor s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799 of framework '20141216-070346-1912885258-5050-32095-0000 I0209 20:24:18.378255 822 docker.cpp:928] Starting container '42cc4e7f-99b9-448e-84dc-0259f0ab867b' for task ' s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799' (and executor ' s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799') of framework '20141216-070346-1912885258-5050-32095-0000' I0209 20:24:19.975740 818 docker.cpp:1502] Destroying container '42cc4e7f-99b9-448e-84dc-0259f0ab867b' E0209 20:24:19.976316 821 slave.cpp:2787] Container '42cc4e7f-99b9-448e-84dc-0259f0ab867b' for executor ' s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799' of framework '20141216-070346-1912885258-5050-32095-0000' failed to start: Failed to 'docker pull docker-registry.ops.yahoo.com:4080/apurvak/yql_test_apple_weather_tenant:04d8c8615ac7': exit status = exited with status 1 stderr = 2015/02/09 20:24:19 Tag 04d8c8615ac7 not found in repository docker-registry.ops.yahoo.com:4080/apurvak/yql_test_apple_weather_tenant I0209 20:24:19.976426 818 docker.cpp:1546] Destroying Container '42cc4e7f-99b9-448e-84dc-0259f0ab867b' in PULLING state E0209 20:24:19.978085 818 slave.cpp:2882] Termination of executor ' s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799' of framework '20141216-070346-1912885258-5050-32095-0000' failed: Unknown container: 42cc4e7f-99b9-448e-84dc-0259f0ab867b E0209 20:24:19.978721 823 slave.cpp:3134] Failed to unmonitor container for executor s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799 of framework 20141216-070346-1912885258-5050-32095-0000: Not monitored I0209 20:24:19.980541 818 slave.cpp:2215] Handling status update TASK_FAILED (UUID: 1037e83c-7e47-4905-9c7a-b936e363a851) for task s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799 of framework 20141216-070346-1912885258-5050-32095-0000 from @0.0.0.0:0 W0209 20:24:19.981329 819 docker.cpp:1184] Ignoring updating unknown container: 42cc4e7f-99b9-448e-84dc-0259f0ab867b I0209 20:24:19.982303 818 status_update_manager.cpp:317] Received status update TASK_FAILED (UUID: 1037e83c-7e47-4905-9c7a-b936e363a851) for task s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799 of framework 20141216-070346-1912885258-5050-32095-0000 I0209 20:24:19.983125 818 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_FAILED (UUID: 1037e83c-7e47-4905-9c7a-b936e363a851) for task s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799 of framework 20141216-070346-1912885258-5050-32095-0000 I0209 20:24:19.987365 818 slave.cpp:2458] Forwarding the update TASK_FAILED (UUID: 1037e83c-7e47-4905-9c7a-b936e363a851) for task s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799 of framework 20141216-070346-1912885258-5050-32095-0000 to master@10.80.4.114:5050 I0209 20:24:20.003286 819 status_update_manager.cpp:389] Received status update acknowledgement (UUID: 1037e83c-7e47-4905-9c7a-b936e363a851) for task s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799 of framework 20141216-070346-1912885258-5050-32095-0000 I0209 20:24:20.003672 819 status_update_manager.hpp:346] Checkpointing ACK for status update TASK_FAILED (UUID: 1037e83c-7e47-4905-9c7a-b936e363a851) for task s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799 of framework 20141216-070346-1912885258-5050-32095-0000 I0209 20:24:20.007709 819 slave.cpp:3007] Cleaning up executor ' s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799' of framework 20141216-070346-1912885258-5050-32095-0000 I0209 20:24:20.008559 820 gc.cpp:56] Scheduling '/var/run/mesos/slaves/20150207-012846-1912885258-5050-1164-S8/frameworks/20141216-070346-1912885258-5050-32095-0000/executors/ s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799/runs/42cc4e7f-99b9-448e-84dc-0259f0ab867b' for gc 6.99999990292741days in the future I0209 20:24:20.008646 819 slave.cpp:3084] Cleaning up framework 20141216-070346-1912885258-5050-32095-0000 I0209 20:24:20.009016 820 gc.cpp:56] Scheduling '/var/run/mesos/slaves/20150207-012846-1912885258-5050-1164-S8/frameworks/20141216-070346-1912885258-5050-32095-0000/executors/ s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799' for gc 6.99999990141333days in the future I0209 20:24:20.009558 825 status_update_manager.cpp:279] Closing status update streams for framework 20141216-070346-1912885258-5050-32095-0000 I0209 20:24:20.009974 820 gc.cpp:56] Scheduling '/var/run/mesos/meta/slaves/20150207-012846-1912885258-5050-1164-S8/frameworks/20141216-070346-1912885258-5050-32095-0000/executors/ s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799/runs/42cc4e7f-99b9-448e-84dc-0259f0ab867b' for gc 6.9999999007763days in the future I0209 20:24:20.010691 820 gc.cpp:56] Scheduling '/var/run/mesos/meta/slaves/20150207-012846-1912885258-5050-1164-S8/frameworks/20141216-070346-1912885258-5050-32095-0000/executors/ s-app-3.9ff7a871-b099-11e4-9b55-56847afe9799' for gc 6.99999990023111days in the future I0209 20:24:20.011175 820 gc.cpp:56] Scheduling '/var/run/mesos/slaves/20150207-012846-1912885258-5050-1164-S8/frameworks/20141216-070346-1912885258-5050-32095-0000' for gc 6.99999988994074days in the future I0209 20:24:20.011530 820 gc.cpp:56] Scheduling '/var/run/mesos/meta/slaves/20150207-012846-1912885258-5050-1164-S8/frameworks/20141216-070346-1912885258-5050-32095-0000' for gc 6.99999988933926days in the future I0209 20:25:15.613370 819 slave.cpp:3321] Current usage 1.49%. Max allowed age: 6.195929403155035days

— Reply to this email directly or view it on GitHub.

more-free commented 9 years ago

my bad. Yes we did it that way. We submit json file to marathon to start a docker. We just didn't "customize" any executor of our own (so it uses the default command executor of mesos).

more-free commented 9 years ago

Also, I tried to increase the "executor registration timeout", it still doesn't work... it almost fails immediately...

more-free commented 9 years ago

It's resolved.... It turns out to be some internal issue with our docker image.. Thanks anyway !

ghost commented 8 years ago

@more-free Could you give more details on what was wrong with the docker image?

I am having a similar issue, other containers launch fine on marathon, but a container i have that is particularly large, ie >5gb fails to deploy, it keeps saying "No such image" in the logs.

If i manually docker pull the image on the box, which takes a while (+10 mins) and redeploy the container on marathon it works fine.