mesos / cloudfoundry-mesos

Cloud Foundry on Mesos Framework
Apache License 2.0
107 stars 26 forks source link

Failed to push app in Cloud Foundry on Mesos environment #3

Open eric-nuaa opened 8 years ago

eric-nuaa commented 8 years ago

Hi,

I followed the steps in https://github.com/mesos/cloudfoundry-mesos/blob/master/docs/getting-started.md to set up a Cloud Foundry on Mesos env, basically in my env, there is only one physical machine which has vagrant and virtualbox installed, and there are two VMs on this physical machine:

  1. The first VM was created by Bosh-Lite and has CF and Diego deployed in it
  2. The second VM has Mesos 0.25.0 (both Mesos master and Mesos slave ) deployed in it, and I also installed docker in it and pulled the image "jianhuiz/diego-cell":
# docker images
docker images 
REPOSITORY            TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
jianhuiz/diego-cell   219-1434-307        05ca264e13b6        8 weeks ago         1.425 GB
jianhuiz/diego-cell   latest              05ca264e13b6        8 weeks ago         1.425 GB

And I also patched auctioneer so that it can register Mesos as a framework, see the Mesos master log:

I1231 13:07:23.623427 25130 master.cpp:2179] Received SUBSCRIBE call for framework 'Diego Scheduler' at scheduler(1)@10.244.16.134:40726
I1231 13:07:23.623734 25130 master.cpp:2250] Subscribing framework Diego Scheduler with checkpointing disabled and capabilities [  ]
I1231 13:07:23.625143 25136 hierarchical.hpp:515] Added framework 04e03265-6fbf-4925-a484-b0010b412ffc-0006

Everything looks good at this point, and then I pushed the hello world app (https://github.com/jianhuiz/cf-apps/tree/master/hello) from the physical machine, but it failed:

root@machine1~/workspace/cf-apps/hello> cf push 
Using manifest file /root/workspace/cf-apps/hello/manifest.yml

Creating app hello in org diego / space diego as admin...
OK

Using route hello.bosh-lite.com
Binding hello.bosh-lite.com to hello...
OK

Uploading hello...
Uploading app files from: /root/workspace/cf-apps/hello
Uploading 1020B, 2 files
Done uploading               
OK

Starting app hello in org diego / space diego as admin...

FAILED
hello failed to stage within 15.000000 minutes

The following is what I saw in Mesos slave log for how a task was handled:

I1231 13:38:06.042736 25308 slave.cpp:1270] Got assigned task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d for framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007
I1231 13:38:06.044637 25308 gc.cpp:84] Unscheduling '/tmp/mesos/slaves/04e03265-6fbf-4925-a484-b0010b412ffc-S0/frameworks/04e03265-6fbf-4925-a484-b0010b412ffc-0007' from gc
I1231 13:38:06.045428 25308 gc.cpp:84] Unscheduling '/tmp/mesos/slaves/04e03265-6fbf-4925-a484-b0010b412ffc-S0/frameworks/04e03265-6fbf-4925-a484-b0010b412ffc-0007/executors/diego-executor' from gc
I1231 13:38:06.046202 25308 slave.cpp:1386] Launching task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d for framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007
I1231 13:38:06.055021 25308 slave.cpp:4852] Launching executor diego-executor of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007 with resources  in work directory '/tmp/mesos/slaves/04e03265-6fbf-4925-a484-b0010b412ffc-S0/frameworks/04e03265-6fbf-4925-a484-b0010b412ffc-0007/executors/diego-executor/runs/3aed6c87-bbab-4ed6-8504-3fdcf5d030f9'
I1231 13:38:06.056267 25308 slave.cpp:1604] Queuing task 'de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d' for executor diego-executor of framework '04e03265-6fbf-4925-a484-b0010b412ffc-0007
I1231 13:38:06.067430 25303 docker.cpp:766] Starting container '3aed6c87-bbab-4ed6-8504-3fdcf5d030f9' for executor 'diego-executor' and framework '04e03265-6fbf-4925-a484-b0010b412ffc-0007'
I1231 13:38:17.363279 25306 slave.cpp:2379] Got registration for executor 'diego-executor' of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007 from executor(1)@192.168.33.10:36629
I1231 13:38:17.365413 25309 docker.cpp:1016] Ignoring updating container '3aed6c87-bbab-4ed6-8504-3fdcf5d030f9' with resources passed to update is identical to existing resources
I1231 13:38:17.366082 25309 slave.cpp:1760] Sending queued task 'de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d' to executor 'diego-executor' of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007
I1231 13:38:17.370671 25309 slave.cpp:2717] Handling status update TASK_STARTING (UUID: bfa93e28-afc3-11e5-a4c5-080027ca3ef9) for task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007 from executor(1)@192.168.33.10:36629
I1231 13:38:17.371459 25309 status_update_manager.cpp:322] Received status update TASK_STARTING (UUID: bfa93e28-afc3-11e5-a4c5-080027ca3ef9) for task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007
I1231 13:38:17.372618 25310 slave.cpp:3016] Forwarding the update TASK_STARTING (UUID: bfa93e28-afc3-11e5-a4c5-080027ca3ef9) for task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007 to master@192.168.33.10:5050
I1231 13:38:17.373111 25310 slave.cpp:2946] Sending acknowledgement for status update TASK_STARTING (UUID: bfa93e28-afc3-11e5-a4c5-080027ca3ef9) for task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007 to executor(1)@192.168.33.10:36629
I1231 13:38:17.381140 25309 status_update_manager.cpp:394] Received status update acknowledgement (UUID: bfa93e28-afc3-11e5-a4c5-080027ca3ef9) for task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007
I1231 13:38:18.382237 25309 slave.cpp:2717] Handling status update TASK_RUNNING (UUID: c043858f-afc3-11e5-a4c5-080027ca3ef9) for task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007 from executor(1)@192.168.33.10:36629
I1231 13:38:18.383672 25306 status_update_manager.cpp:322] Received status update TASK_RUNNING (UUID: c043858f-afc3-11e5-a4c5-080027ca3ef9) for task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007
I1231 13:38:18.385082 25307 slave.cpp:3016] Forwarding the update TASK_RUNNING (UUID: c043858f-afc3-11e5-a4c5-080027ca3ef9) for task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007 to master@192.168.33.10:5050
I1231 13:38:18.385604 25307 slave.cpp:2946] Sending acknowledgement for status update TASK_RUNNING (UUID: c043858f-afc3-11e5-a4c5-080027ca3ef9) for task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007 to executor(1)@192.168.33.10:36629
I1231 13:38:18.393615 25308 status_update_manager.cpp:394] Received status update acknowledgement (UUID: c043858f-afc3-11e5-a4c5-080027ca3ef9) for task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007
I1231 13:39:46.461815 25309 slave.cpp:2717] Handling status update TASK_FINISHED (UUID: f4c37272-afc3-11e5-a4c5-080027ca3ef9) for task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007 from executor(1)@192.168.33.10:36629
W1231 13:39:46.463006 25309 docker.cpp:1028] Ignoring update as no supported resources are present
I1231 13:39:46.463824 25308 status_update_manager.cpp:322] Received status update TASK_FINISHED (UUID: f4c37272-afc3-11e5-a4c5-080027ca3ef9) for task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007
I1231 13:39:46.464979 25304 slave.cpp:3016] Forwarding the update TASK_FINISHED (UUID: f4c37272-afc3-11e5-a4c5-080027ca3ef9) for task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007 to master@192.168.33.10:5050
I1231 13:39:46.465512 25304 slave.cpp:2946] Sending acknowledgement for status update TASK_FINISHED (UUID: f4c37272-afc3-11e5-a4c5-080027ca3ef9) for task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007 to executor(1)@192.168.33.10:36629
I1231 13:39:46.473815 25305 status_update_manager.cpp:394] Received status update acknowledgement (UUID: f4c37272-afc3-11e5-a4c5-080027ca3ef9) for task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007
I1231 13:39:51.973965 25307 docker.cpp:1592] Executor for container '3aed6c87-bbab-4ed6-8504-3fdcf5d030f9' has exited
I1231 13:39:51.974119 25307 docker.cpp:1390] Destroying container '3aed6c87-bbab-4ed6-8504-3fdcf5d030f9'
I1231 13:39:51.974411 25307 docker.cpp:1494] Running docker stop on container '3aed6c87-bbab-4ed6-8504-3fdcf5d030f9'
I1231 13:39:51.975558 25304 slave.cpp:3433] Executor 'diego-executor' of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007 has terminated with unknown status
I1231 13:39:51.976263 25304 slave.cpp:3544] Cleaning up executor 'diego-executor' of framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007
I1231 13:39:51.976714 25308 gc.cpp:56] Scheduling '/tmp/mesos/slaves/04e03265-6fbf-4925-a484-b0010b412ffc-S0/frameworks/04e03265-6fbf-4925-a484-b0010b412ffc-0007/executors/diego-executor/runs/3aed6c87-bbab-4ed6-8504-3fdcf5d030f9' for gc 6.99998869856593days in the future
I1231 13:39:51.976972 25304 slave.cpp:3633] Cleaning up framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007
I1231 13:39:51.976969 25308 gc.cpp:56] Scheduling '/tmp/mesos/slaves/04e03265-6fbf-4925-a484-b0010b412ffc-S0/frameworks/04e03265-6fbf-4925-a484-b0010b412ffc-0007/executors/diego-executor' for gc 6.99998869405037days in the future
I1231 13:39:51.977311 25308 gc.cpp:56] Scheduling '/tmp/mesos/slaves/04e03265-6fbf-4925-a484-b0010b412ffc-S0/frameworks/04e03265-6fbf-4925-a484-b0010b412ffc-0007' for gc 6.99998868960889days in the future
I1231 13:39:51.977401 25304 status_update_manager.cpp:284] Closing status update streams for framework 04e03265-6fbf-4925-a484-b0010b412ffc-0007

In the above log, I can see the task can be started and running, but later (about 1 min) it finished, and then the executor exited which seems not correct ...

Any help will be appreciated!

BTW, before I patched auctioneer, the app can be pushed successfully and running very well in the original Cloud Foundry + Diego env (without integrating with Mesos)

eric-nuaa commented 8 years ago

And the following is the log of the docker container which acted as Mesos executor and created from the image "jianhuiz/diego-cell":

# docker logs fb3447004643
+ CONSUL_SERVER=10.244.0.54
+ ETCD_URL=http://10.244.0.42:4001
+ hostname
+ CELL_ID=vagrant-ubuntu-trusty-64
+ chown -R vcap:vcap /var/vcap/packages/rootfs_cflinuxfs2/rootfs/home/vcap
+ sed -i s|\("node_name"\s*:\s*\)"[^"]*"|\1"vagrant-ubuntu-trusty-64"|g /var/vcap/jobs/consul_agent/config/config.json
+ sed -i s|\("bind_addr"\s*:\s*\)"[^"]*"|\1"0.0.0.0"|g /var/vcap/jobs/consul_agent/config/config.json
+ sed -i s|\("retry_join"\s*:\s*\)\[[^\]*\]|\1\["10.244.0.54"\]|g /var/vcap/jobs/consul_agent/config/config.json
+ sed -i s|\("Job"\s*:\s*\)"[^"]*"|\1"vagrant-ubuntu-trusty-64"|g /var/vcap/jobs/metron_agent/config/metron_agent.json
+ sed -i s|\("EtcdUrls"\s*:\s*\)\[[^\]*\]|\1\["http://10.244.0.42:4001"\]|g /var/vcap/jobs/metron_agent/config/metron_agent.json
+ grep -q 127.0.0.1 /etc/resolv.conf
+ sed -e 1i nameserver 127.0.0.1 /etc/resolv.conf
+ echo nameserver 127.0.0.1
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.0.2.3
+ CONSUL_SERVER=10.244.0.54 CELL_ID=vagrant-ubuntu-trusty-64 /var/vcap/jobs/consul_agent/bin/agent_ctl start
+ /var/vcap/jobs/metron_agent/bin/metron_agent_ctl start
+ /var/vcap/jobs/garden/bin/garden_ctl start
+ RUN_DIR=/var/vcap/sys/run/garden
+ LOG_DIR=/var/vcap/sys/log/garden
+ PIDFILE=/var/vcap/sys/run/garden/garden.pid
+ DATA_DIR=/var/vcap/data/garden
+ case $1 in
+ setup
+ mkdir -p /var/vcap/sys/log/monit
+ exec
+ exec
+ CELL_ID=vagrant-ubuntu-trusty-64 /var/vcap/jobs/rep/bin/rep_ctl start
+ /executor -logtostderr=true
Starting Diego Executor
I1231 13:40:18.182650     344 executor.go:121] Init mesos executor driver
I1231 13:40:18.182908     344 executor.go:122] Protocol Version: 0.24.0
I1231 13:40:18.183151     344 executor.go:414] Starting the executor driver
I1231 13:40:18.183502     344 http_transporter.go:396] listening on 192.168.33.10 port 56429
Executor process has started and running.
I1231 13:40:18.183742     344 executor.go:449] Mesos executor is started with PID= executor(1)@192.168.33.10:56429
I1231 13:40:18.184115     344 executor.go:516] Waiting for the executor driver to stop
I1231 13:40:18.189545     344 executor.go:218] Executor driver registered
I1231 13:40:18.189609     344 executor.go:231] Registered on slave &SlaveID{Value:*04e03265-6fbf-4925-a484-b0010b412ffc-S0,XXX_unrecognized:[],}
Registered Executor on slave  vagrant-ubuntu-trusty-64
I1231 13:40:18.192082     344 executor.go:307] Executor driver runTask
I1231 13:40:18.192163     344 executor.go:321] Executor asked to run task '&TaskID{Value:*de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d,XXX_unrecognized:[],}'
Launching task de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d with command 
I1231 13:40:18.192585     344 executor.go:577] Executor sending status update &StatusUpdate{FrameworkId:&FrameworkID{Value:*04e03265-6fbf-4925-a484-b0010b412ffc-0007,XXX_unrecognized:[],},ExecutorId:&ExecutorID{Value:*diego-executor,XXX_unrecognized:[],},SlaveId:&SlaveID{Value:*04e03265-6fbf-4925-a484-b0010b412ffc-S0,XXX_unrecognized:[],},Status:&TaskStatus{TaskId:&TaskID{Value:*de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d,XXX_unrecognized:[],},State:*TASK_STARTING,Data:nil,Message:nil,SlaveId:&SlaveID{Value:*04e03265-6fbf-4925-a484-b0010b412ffc-S0,XXX_unrecognized:[],},Timestamp:*1.451569218e+09,ExecutorId:nil,Healthy:nil,Source:nil,Reason:nil,Uuid:nil,Labels:nil,XXX_unrecognized:[],},Timestamp:*1.451569218e+09,Uuid:*[7 173 104 207 175 196 17 229 189 182 8 0 39 202 62 249],LatestState:nil,XXX_unrecognized:[],}
I1231 13:40:18.199076     344 executor.go:342] Executor statusUpdateAcknowledgement
I1231 13:40:18.199114     344 executor.go:345] Receiving status update acknowledgement &StatusUpdateAcknowledgementMessage{SlaveId:&SlaveID{Value:*04e03265-6fbf-4925-a484-b0010b412ffc-S0,XXX_unrecognized:[],},FrameworkId:&FrameworkID{Value:*04e03265-6fbf-4925-a484-b0010b412ffc-0007,XXX_unrecognized:[],},TaskId:&TaskID{Value:*de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d,XXX_unrecognized:[],},Uuid:*[7 173 104 207 175 196 17 229 189 182 8 0 39 202 62 249],XXX_unrecognized:[],}
I1231 13:40:19.202127     344 executor.go:577] Executor sending status update &StatusUpdate{FrameworkId:&FrameworkID{Value:*04e03265-6fbf-4925-a484-b0010b412ffc-0007,XXX_unrecognized:[],},ExecutorId:&ExecutorID{Value:*diego-executor,XXX_unrecognized:[],},SlaveId:&SlaveID{Value:*04e03265-6fbf-4925-a484-b0010b412ffc-S0,XXX_unrecognized:[],},Status:&TaskStatus{TaskId:&TaskID{Value:*de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d,XXX_unrecognized:[],},State:*TASK_RUNNING,Data:nil,Message:nil,SlaveId:&SlaveID{Value:*04e03265-6fbf-4925-a484-b0010b412ffc-S0,XXX_unrecognized:[],},Timestamp:*1.451569219e+09,ExecutorId:nil,Healthy:nil,Source:nil,Reason:nil,Uuid:nil,Labels:nil,XXX_unrecognized:[],},Timestamp:*1.451569219e+09,Uuid:*[8 71 119 152 175 196 17 229 189 182 8 0 39 202 62 249],LatestState:nil,XXX_unrecognized:[],}
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
I1231 13:40:19.209304     344 executor.go:342] Executor statusUpdateAcknowledgement
I1231 13:40:19.209416     344 executor.go:345] Receiving status update acknowledgement &StatusUpdateAcknowledgementMessage{SlaveId:&SlaveID{Value:*04e03265-6fbf-4925-a484-b0010b412ffc-S0,XXX_unrecognized:[],},FrameworkId:&FrameworkID{Value:*04e03265-6fbf-4925-a484-b0010b412ffc-0007,XXX_unrecognized:[],},TaskId:&TaskID{Value:*de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d,XXX_unrecognized:[],},Uuid:*[8 71 119 152 175 196 17 229 189 182 8 0 39 202 62 249],XXX_unrecognized:[],}
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
repContainerSet:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:true]
taskStateMap:  map[de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d:TASK_RUNNING]
I1231 13:41:47.273714     344 executor.go:577] Executor sending status update &StatusUpdate{FrameworkId:&FrameworkID{Value:*04e03265-6fbf-4925-a484-b0010b412ffc-0007,XXX_unrecognized:[],},ExecutorId:&ExecutorID{Value:*diego-executor,XXX_unrecognized:[],},SlaveId:&SlaveID{Value:*04e03265-6fbf-4925-a484-b0010b412ffc-S0,XXX_unrecognized:[],},Status:&TaskStatus{TaskId:&TaskID{Value:*de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d,XXX_unrecognized:[],},State:*TASK_FINISHED,Data:nil,Message:nil,SlaveId:&SlaveID{Value:*04e03265-6fbf-4925-a484-b0010b412ffc-S0,XXX_unrecognized:[],},Timestamp:*1.451569307e+09,ExecutorId:nil,Healthy:nil,Source:nil,Reason:nil,Uuid:nil,Labels:nil,XXX_unrecognized:[],},Timestamp:*1.451569307e+09,Uuid:*[60 198 31 159 175 196 17 229 189 182 8 0 39 202 62 249],LatestState:nil,XXX_unrecognized:[],}
repContainerSet:  map[]
taskStateMap:  map[]
I1231 13:41:47.284461     344 executor.go:342] Executor statusUpdateAcknowledgement
I1231 13:41:47.284540     344 executor.go:345] Receiving status update acknowledgement &StatusUpdateAcknowledgementMessage{SlaveId:&SlaveID{Value:*04e03265-6fbf-4925-a484-b0010b412ffc-S0,XXX_unrecognized:[],},FrameworkId:&FrameworkID{Value:*04e03265-6fbf-4925-a484-b0010b412ffc-0007,XXX_unrecognized:[],},TaskId:&TaskID{Value:*de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d,XXX_unrecognized:[],},Uuid:*[60 198 31 159 175 196 17 229 189 182 8 0 39 202 62 249],XXX_unrecognized:[],}
repContainerSet:  map[]
taskStateMap:  map[]
repContainerSet:  map[]
taskStateMap:  map[]
repContainerSet:  map[]
taskStateMap:  map[]
repContainerSet:  map[]
taskStateMap:  map[]
I1231 13:41:51.316366     344 executor.go:503] Aborting the executor driver
I1231 13:41:51.316427     344 messenger.go:269] stopping messenger..
I1231 13:41:51.316648     344 http_transporter.go:465] stopping HTTP transport
+ /var/vcap/jobs/rep/bin/rep_ctl stop
+ /var/vcap/jobs/garden/bin/garden_ctl stop
+ RUN_DIR=/var/vcap/sys/run/garden
+ LOG_DIR=/var/vcap/sys/log/garden
+ PIDFILE=/var/vcap/sys/run/garden/garden.pid
+ DATA_DIR=/var/vcap/data/garden
+ case $1 in
+ setup
+ mkdir -p /var/vcap/sys/log/monit
+ exec
+ exec
+ umount /var/vcap/data/garden/btrfs_graph/btrfs
+ umount -d /var/vcap/data/garden/btrfs_graph
+ /var/vcap/jobs/metron_agent/bin/metron_agent_ctl stop
+ /var/vcap/jobs/consul_agent/bin/agent_ctl stop
eric-nuaa commented 8 years ago

And I also see something in rep.stdout.log, not sure if it is related.

...
{"timestamp":"1451567443.169189930","source":"rep","message":"rep.wait-for-garden.failed-to-ping-garden","log_level":2,"data":{"error":"Get http://api/ping: dial tcp 127.0.0.1:7777:
getsockopt: connection refused","initialTime:":"2015-12-31T13:10:43.158846973Z","session":"1"}}
...
{"timestamp":"1451567444.185091019","source":"rep","message":"rep.presence.failed-setting-presence","log_level":2,"data":{"error":"Get http://127.0.0.1:8500/v1/agent/self: dial tcp 1
27.0.0.1:8500: getsockopt: connection refused","key":"v1/locks/cell/vagrant-ubuntu-trusty-64","session":"10","value":"{\"cell_id\":\"vagrant-ubuntu-trusty-64\",\"rep_address\":\"http
://10.0.2.15:1800\",\"zone\":\"z1\",\"capacity\":{\"memory_mb\":7985,\"disk_mb\":40284,\"containers\":256},\"rootfs_providers\":{\"docker\":[],\"preloaded\":[\"cflinuxfs2\"]}}"}}
{"timestamp":"1451567445.226581573","source":"rep","message":"rep.auction-perform-work.handling","log_level":1,"data":{"session":"14"}}
...
{"timestamp":"1451567449.186274767","source":"rep","message":"rep.presence.failed-recreating-session","log_level":2,"data":{"error":"Get http://127.0.0.1:8500/v1/agent/self: dial tcp
 127.0.0.1:8500: getsockopt: connection refused","key":"v1/locks/cell/vagrant-ubuntu-trusty-64","session":"10","value":"{\"cell_id\":\"vagrant-ubuntu-trusty-64\",\"rep_address\":\"ht
tp://10.0.2.15:1800\",\"zone\":\"z1\",\"capacity\":{\"memory_mb\":7985,\"disk_mb\":40284,\"containers\":256},\"rootfs_providers\":{\"docker\":[],\"preloaded\":[\"cflinuxfs2\"]}}"}}
{"timestamp":"1451567449.186493397","source":"rep","message":"rep.container-metrics-reporter.depot-client.get-all-metrics.succeeded-listing-garden-containers","log_level":0,"data":{"
filter":{"executor:owner":"executor"},"session":"4.1.1"}}
...
{"timestamp":"1451567452.845011234","source":"rep","message":"rep.event-consumer.operation-stream.executing-container-operation.task-processor.failed-starting-task","log_level":2,"da
ta":{"container-guid":"de94ffb7-940e-4a81-9b05-d744dc6f72a5-cd164ebc23264d0da8d3bedb2acdb10d","container-state":"reserved","error":"Post https://bbs.service.cf.internal:8889/v1/tasks
/start: dial tcp: lookup bbs.service.cf.internal on 10.0.2.3:53: no such host","session":"12.1.1.1"}}
...
eric-nuaa commented 8 years ago

And this is what I see in garden.stdout.log:

{"timestamp":"1451567443.128203869","source":"garden-linux","message":"garden-linux.retain.starting","log_level":1,"data":{"session":"7"}}
{"timestamp":"1451567443.128637791","source":"garden-linux","message":"garden-linux.retain.retaining","log_level":1,"data":{"session":"7","url":"/var/vcap/packages/rootfs_cflinuxfs2/
rootfs"}}
{"timestamp":"1451567443.128825426","source":"garden-linux","message":"garden-linux.retain.retaining-complete","log_level":1,"data":{"session":"7","url":"/var/vcap/packages/rootfs_cf
linuxfs2/rootfs"}}
{"timestamp":"1451567443.128902912","source":"garden-linux","message":"garden-linux.retain.retained","log_level":1,"data":{"session":"7"}}
{"timestamp":"1451567443.385570526","source":"garden-linux","message":"garden-linux.started","log_level":1,"data":{"addr":"0.0.0.0:7777","network":"tcp"}}
{"timestamp":"1451567445.233651638","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.6"}}
{"timestamp":"1451567446.227868080","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.8"}}
{"timestamp":"1451567446.232680559","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.10"}}
{"timestamp":"1451567447.245183229","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.12"}}
{"timestamp":"1451567447.249213934","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.14"}}
{"timestamp":"1451567448.254944324","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.16"}}
{"timestamp":"1451567448.258607388","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.18"}}
{"timestamp":"1451567449.188147306","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.21"}}
{"timestamp":"1451567449.190103769","source":"garden-linux","message":"garden-linux.garden-server.bulk_metrics.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.22"}}
{"timestamp":"1451567449.265397310","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.24"}}
{"timestamp":"1451567449.269826412","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.26"}}
{"timestamp":"1451567450.276978970","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.28"}}
{"timestamp":"1451567450.283700228","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.30"}}
{"timestamp":"1451567451.290451050","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.32"}}
{"timestamp":"1451567451.295217752","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.34"}}
{"timestamp":"1451567452.301832914","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.36"}}
{"timestamp":"1451567452.306668282","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.38"}}
{"timestamp":"1451567453.312611580","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.40"}}
{"timestamp":"1451567453.316643476","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.42"}}
{"timestamp":"1451567454.186395168","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.44"}}
{"timestamp":"1451567454.188435555","source":"garden-linux","message":"garden-linux.garden-server.bulk_metrics.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.46"}}
{"timestamp":"1451567454.321904898","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.48"}}
{"timestamp":"1451567454.325867414","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.50"}}
{"timestamp":"1451567455.332529306","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.52"}}
{"timestamp":"1451567455.336611032","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.54"}}
{"timestamp":"1451567456.342327595","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.56"}}
{"timestamp":"1451567456.346458673","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.58"}}
{"timestamp":"1451567457.354220390","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.60"}}
{"timestamp":"1451567457.359736204","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.62"}}
{"timestamp":"1451567458.366505384","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.64"}}
{"timestamp":"1451567458.369956255","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.66"}}
{"timestamp":"1451567459.194800377","source":"garden-linux","message":"garden-linux.garden-server.bulk_info.got-bulkinfo","log_level":1,"data":{"handles":[""],"session":"8.69"}}
...
{"timestamp":"1451567538.513858080","source":"garden-linux","message":"garden-linux.garden-server.waiting-for-connections-to-close","log_level":1,"data":{"session":"8"}}
{"timestamp":"1451567538.514035463","source":"garden-linux","message":"garden-linux.garden-server.stopping-backend","log_level":1,"data":{"session":"8"}}
{"timestamp":"1451567538.514078379","source":"garden-linux","message":"garden-linux.garden-server.stopped","log_level":1,"data":{"session":"8"}}
jianhuiz commented 8 years ago

It appears that the consul agent wasn't successfully started, you may want to check the consul for details.

You can also start the diego-cell container with --entrypoint=/bin/bash then run /entrypoint.sh /bin/bash manually to verify that all the agents are running well. Don't forget to pass the environment variables CONSUL_SERVER and ETCD_URL (ETCD of CF, not Diego) and run the container in privileged mode with host network.

Amit-PivotalLabs commented 8 years ago

The timestamps are quite different, so it's a bit hard to guess at what's happening. The slave logs end at 13:39:51.977401, the executor logs start at 13:40:18.182650, so that's weird. The rep logs you show have a timestamp of 1451567443.169189930, which is 13:10:44 (UTC).

So, for instance, the garden logs suggest it was never asked to create a container for the staging task, but the timestamps are so far behind when then mesos slave was even asked to create the cell, I wonder if those logs are just misleading. Or maybe the NTP config on your VMs are out of whack?

Based on the rep logs, looks like at least 2 important things aren't working. Inside the executor container, garden should be running, bound to port 7777, and listening on either 127.0.0.1 or 0.0.0.0, but seems like the rep is not able to talk to it (both rep and garden should be running in the same executor container). Also looks like it can't look up the bbs address, which may be an issue with how consul is running inside that executor container. You could check if the consul agent is running in the executor container in agent mode, and has joined the consul membership ring by checking its logs.

eric-nuaa commented 8 years ago

Thanks @jianhuiz and @Amit-PivotalLabs. I manually ran the diego-cell container with the command below:

docker run --privileged --net=host -it -e "CONSUL_SERVER=10.244.0.54" -e "ETCD_URL=http://10.244.0.42:4001" --entrypoint=/bin/bash jianhuiz/diego-cell

And then inside the container, I ran "/entrypoint.sh /executor -logtostderr=true" (I changed the entrypoint.sh by adding a "sleep 10000" in it so that it can hang there), and found:

root@vagrant-ubuntu-trusty-64:/# ps -ef 
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 09:17 ?        00:00:00 /bin/bash
root        14     0  0 09:21 ?        00:00:00 /bin/bash
root        34     1  0 09:22 ?        00:00:00 /bin/sh /entrypoint.sh /executor -logtostderr=true
syslog     149     1  0 09:22 ?        00:00:00 /usr/sbin/rsyslogd
vcap       157     1  0 09:22 ?        00:00:01 /var/vcap/packages/metron_agent/metron --config /var/vcap/jobs/metron_agent/config/metron_agent.json
root       204     1  0 09:22 ?        00:00:00 /var/vcap/packages/garden-linux/bin/garden-linux -depot=/var/vcap/data/garden/depot -snapshots=/var/vcap/data/garden/snapshots -graph=
vcap       241     1  0 09:22 ?        00:00:02 /var/vcap/packages/rep/bin/rep -bbsClientCert=/var/vcap/jobs/rep/config/certs/bbs/client.crt -bbsClientKey=/var/vcap/jobs/rep/config/c
root       244   241  0 09:22 ?        00:00:00 /bin/bash -e /var/vcap/jobs/rep/bin/rep_ctl start
root       245   244  0 09:22 ?        00:00:00 tee -a /var/vcap/sys/log/rep/rep.stdout.log
root       246   244  0 09:22 ?        00:00:00 logger -t vcap.rep
root       376    34  0 09:22 ?        00:00:00 sleep 10000
root       441    14  0 09:28 ?        00:00:00 ps -ef

So it seems indeed consul agent wasn't successfully started, and then I manually ran this in the container:

root@vagrant-ubuntu-trusty-64:/# CONSUL_SERVER=10.244.0.54 CELL_ID=vagrant-ubuntu-trusty-64 /var/vcap/jobs/consul_agent/bin/agent_ctl start 
root@vagrant-ubuntu-trusty-64:/# echo $? 
1

So the exit code shows something wrong, but no any other detailed info, there is no consul agent log:

root@vagrant-ubuntu-trusty-64:/# ls -l /var/vcap/sys/log/consul_agent/
total 0

So is there a way that I can debug why consul agent can not started? Thanks!

Amit-PivotalLabs commented 8 years ago

The

/var/vcap/jobs/consul_agent/bin/agent_ctl

script should log std{out,err} to

/var/vcap/sys/log/monit/consul_agent.{out,err}.log

If it gets to the point of actually running (the thin wrapper around) the consul binary itself, that will log to

/var/vcap/sys/log/consul_agent/consul_agent.std{out,err}.log

Can you check the logs in those locations?

On Fri, Jan 1, 2016 at 1:32 AM, eric-nuaa notifications@github.com wrote:

Thanks @jianhuiz https://github.com/jianhuiz and @Amit-PivotalLabs https://github.com/Amit-PivotalLabs. I manually ran the diego-cell container with the command below:

docker run --privileged --net=host -it -e "CONSUL_SERVER=10.244.0.54" -e "ETCD_URL=http://10.244.0.42:4001" --entrypoint=/bin/bash jianhuiz/diego-cell

And then inside the container, I ran "/entrypoint.sh /executor -logtostderr=true" (I changed the entrypoint.sh by adding a "sleep 10000" so that it can hang there), and found:

root@vagrant-ubuntu-trusty-64:/# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 09:17 ? 00:00:00 /bin/bash root 14 0 0 09:21 ? 00:00:00 /bin/bash root 34 1 0 09:22 ? 00:00:00 /bin/sh /entrypoint.sh /executor -logtostderr=true syslog 149 1 0 09:22 ? 00:00:00 /usr/sbin/rsyslogd vcap 157 1 0 09:22 ? 00:00:01 /var/vcap/packages/metron_agent/metron --config /var/vcap/jobs/metron_agent/config/metron_agent.json root 204 1 0 09:22 ? 00:00:00 /var/vcap/packages/garden-linux/bin/garden-linux -depot=/var/vcap/data/garden/depot -snapshots=/var/vcap/data/garden/snapshots -graph= vcap 241 1 0 09:22 ? 00:00:02 /var/vcap/packages/rep/bin/rep -bbsClientCert=/var/vcap/jobs/rep/config/certs/bbs/client.crt -bbsClientKey=/var/vcap/jobs/rep/config/c root 244 241 0 09:22 ? 00:00:00 /bin/bash -e /var/vcap/jobs/rep/bin/rep_ctl start root 245 244 0 09:22 ? 00:00:00 tee -a /var/vcap/sys/log/rep/rep.stdout.log root 246 244 0 09:22 ? 00:00:00 logger -t vcap.rep root 376 34 0 09:22 ? 00:00:00 sleep 10000 root 441 14 0 09:28 ? 00:00:00 ps -ef

So it seems indeed consul agent wasn't successfully started, and then I manually ran this in the container:

root@vagrant-ubuntu-trusty-64:/# CONSUL_SERVER=10.244.0.54 CELL_ID=vagrant-ubuntu-trusty-64 /var/vcap/jobs/consul_agent/bin/agent_ctl start root@vagrant-ubuntu-trusty-64:/# echo $? 1

So the exit code shows something wrong, but no any other detailed info. Is there anyway that I can debug why consul agent can not started? Thanks!

— Reply to this email directly or view it on GitHub https://github.com/mesos/cloudfoundry-mesos/issues/3#issuecomment-168295480 .

eric-nuaa commented 8 years ago

I found the root cause why consul agent was not started. In the script "/var/vcap/jobs/consul_agent/bin/agent_ctl"in the diego-cell container, there is a line:

setcap cap_net_bind_service=+ep $PKG/bin/consul

This command failed! I reran it manually in the diego-cell container:

root@vagrant-ubuntu-trusty-64:/# setcap cap_net_bind_service=+ep /var/vcap/packages/consul/bin/consul
Failed to set capabilities on file `/var/vcap/packages/consul/bin/consul' (Invalid argument)
The value of the capability argument is not permitted for a file. Or the file is not a regular (non-symlink) file

The reason that command failed is the storage driver of my docker host is aufs

# docker info | grep "Storage Driver" 
WARNING: No swap limit support
Storage Driver: aufs

However, the command "setcap" does not work on aufs, see https://github.com/docker/docker/issues/5650 for details. So I changed the storage driver from aufs to devicemapper by following the steps in this link: http://muehe.org/posts/switching-docker-from-aufs-to-devicemapper/

Now I can see consul agent starts, but later it still failed, here is its log:

root@vagrant-ubuntu-trusty-64:/# cat /var/vcap/sys/log/consul_agent/consul_agent.stdout.log 
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'vagrant-ubuntu-trusty-64'
        Datacenter: 'dc1'
            Server: false (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 53, RPC: 8400)
      Cluster Addr: 10.0.2.15 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2016/01/01 13:25:48 [INFO] serf: EventMemberJoin: vagrant-ubuntu-trusty-64 10.0.2.15
    2016/01/01 13:25:48 [ERR] agent: failed to sync remote state: No known Consul servers
    2016/01/01 13:25:48 [INFO] agent: Joining cluster...
    2016/01/01 13:25:48 [INFO] agent: (LAN) joining: [10.244.0.54]
    2016/01/01 13:25:48 [INFO] agent: (LAN) joined: 0 Err: EOF
    2016/01/01 13:25:48 [WARN] agent: Join failed: EOF, retrying in 30s
    2016/01/01 13:25:49 [INFO] agent.rpc: Accepted client: 127.0.0.1:42908
    2016/01/01 13:25:50 [INFO] agent.rpc: Accepted client: 127.0.0.1:42909
    2016/01/01 13:25:51 [INFO] agent.rpc: Accepted client: 127.0.0.1:42910
    2016/01/01 13:25:52 [INFO] agent.rpc: Accepted client: 127.0.0.1:42911
    2016/01/01 13:25:53 [INFO] agent.rpc: Accepted client: 127.0.0.1:42912
    2016/01/01 13:25:54 [INFO] agent.rpc: Accepted client: 127.0.0.1:42913
    2016/01/01 13:25:55 [INFO] agent.rpc: Accepted client: 127.0.0.1:42914
    2016/01/01 13:25:56 [INFO] agent.rpc: Accepted client: 127.0.0.1:42915
    2016/01/01 13:25:57 [INFO] agent.rpc: Accepted client: 127.0.0.1:42916
    2016/01/01 13:25:58 [INFO] agent.rpc: Accepted client: 127.0.0.1:42917
    2016/01/01 13:25:59 [INFO] agent.rpc: Accepted client: 127.0.0.1:42918
    2016/01/01 13:26:00 [INFO] agent.rpc: Accepted client: 127.0.0.1:42919
    2016/01/01 13:26:01 [INFO] agent.rpc: Accepted client: 127.0.0.1:42920
    2016/01/01 13:26:03 [INFO] agent.rpc: Accepted client: 127.0.0.1:42921
    2016/01/01 13:26:04 [INFO] agent.rpc: Accepted client: 127.0.0.1:42922
    2016/01/01 13:26:05 [INFO] agent.rpc: Accepted client: 127.0.0.1:42923
    2016/01/01 13:26:06 [INFO] agent.rpc: Accepted client: 127.0.0.1:42924
    2016/01/01 13:26:07 [INFO] agent.rpc: Accepted client: 127.0.0.1:42925
    2016/01/01 13:26:08 [INFO] agent.rpc: Accepted client: 127.0.0.1:42926
    2016/01/01 13:26:09 [INFO] agent.rpc: Accepted client: 127.0.0.1:42927
    2016/01/01 13:26:09 [ERR] agent: failed to sync remote state: No known Consul servers
    2016/01/01 13:26:10 [INFO] agent.rpc: Accepted client: 127.0.0.1:42928
    2016/01/01 13:26:11 [INFO] agent.rpc: Accepted client: 127.0.0.1:42929
    2016/01/01 13:26:12 [INFO] agent.rpc: Accepted client: 127.0.0.1:42930
    2016/01/01 13:26:13 [INFO] agent.rpc: Accepted client: 127.0.0.1:42931
    2016/01/01 13:26:14 [INFO] agent.rpc: Accepted client: 127.0.0.1:42932
    2016/01/01 13:26:15 [INFO] agent.rpc: Accepted client: 127.0.0.1:42933
    2016/01/01 13:26:16 [INFO] agent.rpc: Accepted client: 127.0.0.1:42934
    2016/01/01 13:26:17 [INFO] agent.rpc: Accepted client: 127.0.0.1:42935
    2016/01/01 13:26:18 [INFO] agent: (LAN) joining: [10.244.0.54]
    2016/01/01 13:26:18 [INFO] agent: (LAN) joined: 0 Err: EOF
    2016/01/01 13:26:18 [WARN] agent: Join failed: EOF, retrying in 30s
    2016/01/01 13:26:18 [INFO] agent.rpc: Accepted client: 127.0.0.1:42937
    2016/01/01 13:26:19 [INFO] agent.rpc: Accepted client: 127.0.0.1:42938
    2016/01/01 13:26:20 [INFO] agent.rpc: Accepted client: 127.0.0.1:42939
    2016/01/01 13:26:21 [INFO] agent.rpc: Accepted client: 127.0.0.1:42940
    2016/01/01 13:26:22 [INFO] agent.rpc: Accepted client: 127.0.0.1:42941
    2016/01/01 13:26:23 [INFO] agent.rpc: Accepted client: 127.0.0.1:42942
    2016/01/01 13:26:24 [INFO] agent.rpc: Accepted client: 127.0.0.1:42943
    2016/01/01 13:26:25 [INFO] agent.rpc: Accepted client: 127.0.0.1:42944
    2016/01/01 13:26:26 [ERR] agent: failed to sync remote state: No known Consul servers
    2016/01/01 13:26:26 [INFO] agent.rpc: Accepted client: 127.0.0.1:42945
    2016/01/01 13:26:28 [INFO] agent.rpc: Accepted client: 127.0.0.1:42946
    2016/01/01 13:26:29 [INFO] agent.rpc: Accepted client: 127.0.0.1:42947
    2016/01/01 13:26:30 [INFO] agent.rpc: Accepted client: 127.0.0.1:42948
    2016/01/01 13:26:31 [INFO] agent.rpc: Accepted client: 127.0.0.1:42949
    2016/01/01 13:26:32 [INFO] agent.rpc: Accepted client: 127.0.0.1:42950
    2016/01/01 13:26:33 [INFO] agent.rpc: Accepted client: 127.0.0.1:42951
    2016/01/01 13:26:34 [INFO] agent.rpc: Accepted client: 127.0.0.1:42952
    2016/01/01 13:26:35 [INFO] agent.rpc: Accepted client: 127.0.0.1:42953
    2016/01/01 13:26:36 [INFO] agent.rpc: Accepted client: 127.0.0.1:42954
    2016/01/01 13:26:37 [INFO] agent.rpc: Accepted client: 127.0.0.1:42955
    2016/01/01 13:26:38 [INFO] agent.rpc: Accepted client: 127.0.0.1:42956
    2016/01/01 13:26:39 [INFO] agent.rpc: Accepted client: 127.0.0.1:42957
    2016/01/01 13:26:40 [INFO] agent.rpc: Accepted client: 127.0.0.1:42958
    2016/01/01 13:26:41 [INFO] agent.rpc: Accepted client: 127.0.0.1:42959
    2016/01/01 13:26:41 [ERR] agent: failed to sync remote state: No known Consul servers
    2016/01/01 13:26:42 [INFO] agent.rpc: Accepted client: 127.0.0.1:42960
    2016/01/01 13:26:43 [INFO] agent.rpc: Accepted client: 127.0.0.1:42961
    2016/01/01 13:26:44 [INFO] agent.rpc: Accepted client: 127.0.0.1:42962
    2016/01/01 13:26:45 [INFO] agent.rpc: Accepted client: 127.0.0.1:42963
    2016/01/01 13:26:46 [INFO] agent.rpc: Accepted client: 127.0.0.1:42964
    2016/01/01 13:26:47 [INFO] agent.rpc: Accepted client: 127.0.0.1:42965
    2016/01/01 13:26:48 [INFO] agent: (LAN) joining: [10.244.0.54]
    2016/01/01 13:26:48 [INFO] agent: (LAN) joined: 0 Err: EOF
    2016/01/01 13:26:48 [WARN] agent: Join failed: EOF, retrying in 30s
    2016/01/01 13:26:48 [INFO] agent.rpc: Accepted client: 127.0.0.1:42967
    2016/01/01 13:26:49 [INFO] agent.rpc: Accepted client: 127.0.0.1:42968
    2016/01/01 13:26:50 [INFO] agent.rpc: Accepted client: 127.0.0.1:42969
    2016/01/01 13:26:50 [INFO] agent.rpc: Graceful leave triggered
    2016/01/01 13:26:50 [INFO] consul: client starting leave
    2016/01/01 13:26:50 [INFO] serf: EventMemberLeave: vagrant-ubuntu-trusty-64 10.0.2.15
    2016/01/01 13:26:50 [INFO] agent: requesting shutdown
    2016/01/01 13:26:50 [INFO] consul: shutting down client
    2016/01/01 13:26:50 [INFO] agent: shutdown complete

It seems consul agent always fails to join ...

eric-nuaa commented 8 years ago

When I manually start consul agent in diego-cell container, I see the following in consul server's log:

2016/01/01 14:28:33 [ERR] memberlist: failed to receive: Encryption is configured but remote state is not encrypted

So it appears that consul server has encryption configured, but the consul agent in diego-cell container does not have it configured.

How can I disable encryption for consul when deploying it with Cloud Foundry?

Amit-PivotalLabs commented 8 years ago

Assuming you generated the BOSH-Lite manifest for Cloud Foundry using the scripts/generate-bosh-lite-dev-manifest script in cf-release repo, you can just call that script with a second argument which is a path to some file with the following YAML contents:

properties:
  consul:
    require_ssl: false
eric-nuaa commented 8 years ago

Thanks @Amit-PivotalLabs, I have successfully deployed Cloud Foundry on Mesos :-)

eric-nuaa commented 8 years ago

It seems my CF on Mesos env not stable, sometimes I can push app successfully, but sometimes I can not:

# cf push         
Using manifest file /root/workspace/cf-apps/hello/manifest.yml
Creating app hello in org diego / space diego as admin...
OK
Using route hello.bosh-lite.com
Binding hello.bosh-lite.com to hello...
OK
Uploading hello...
Uploading app files from: /root/workspace/cf-apps/hello
Uploading 1020B, 2 files
Done uploading               
OK
Starting app hello in org diego / space diego as admin...
FAILED
StagingError
TIP: use 'cf logs hello --recent' for more information

# cf logs hello --recent
Connected, dumping recent logs for app hello in org diego / space diego as admin...

2016-01-03T17:10:40.49+0800 [API/0]      OUT Created app with guid 40d6f9e2-ab6b-4dbf-946b-fee9a0d07c5d
2016-01-03T17:10:42.24+0800 [API/0]      OUT Updated app with guid 40d6f9e2-ab6b-4dbf-946b-fee9a0d07c5d ({"route"=>"6b371227-7af5-47a0-86c9-076d0ad23b13"})
2016-01-03T17:10:50.54+0800 [API/0]      OUT Updated app with guid 40d6f9e2-ab6b-4dbf-946b-fee9a0d07c5d ({"state"=>"STARTED"})
2016-01-03T17:12:52.45+0800 [API/0]      ERR Failed to stage application: staging failed

Any idea about what happened?

Amit-PivotalLabs commented 8 years ago

Couple questions:

  1. when you have a successful push, do you see in the app staging logs things like "Creating container..."? I'm wondering whether logging is broken in the Mesos setup, or if nothing is happening that would log to the user in the first place.
  2. If nothing is happening in the first place, you'll have to gather logs from different components to see what's wrong. Is cc-bridge getting request from cc, is bbs getting request from cc-bridge, is brain (auctioneer) scheduling the staging task, is executor container being created by mesos slave, are all executor processes starting up ok, is rep getting the work from the scheduler, is garden getting create request from rep, etc?
eric-nuaa commented 8 years ago

Can you please let me know where I can get app staging logs? Here is what I see for a successful push:

cf push     
Using manifest file /root/workspace/cf-apps/hello/manifest.yml

Creating app hello in org diego / space diego as admin...
OK

Using route hello.bosh-lite.com
Binding hello.bosh-lite.com to hello...
OK

Uploading hello...
Uploading app files from: /root/workspace/cf-apps/hello
Uploading 1020B, 2 files
Done uploading               
OK

Starting app hello in org diego / space diego as admin...

2 of 2 instances running

App started

OK

App hello was started using this command `node server.js`

Showing health and status for app hello in org diego / space diego as admin...
OK

requested state: started
instances: 2/2
usage: 512M x 2 instances
urls: hello.bosh-lite.com
last uploaded: Mon Jan 4 14:02:10 UTC 2016
stack: cflinuxfs2
buildpack: Node.js

     state     since                    cpu    memory      disk      details   
#0   running   2016-01-04 10:04:57 PM   0.0%   0 of 512M   0 of 1G      
#1   running   2016-01-04 10:04:57 PM   0.0%   0 of 512M   0 of 1G 

And in Mesos slave log, I see "Staring container ...":

I0104 08:25:17.275492  9731 docker.cpp:766] Starting container 'a633194e-1631-4380-a8b0-b175c44b693f' for executor 'diego-executor' and framework '41dafaa1-27d1-4c44-a284-97ff7189752
0-0000'

I will check more logs for the failed push.

Amit-PivotalLabs commented 8 years ago

Hmm, not sure what buildpack: Node.js is. Can you share the contents of your manifest.yml? I have a CF+Diego deployment with a built-in node buildpack:

$ cf buildpacks
Getting buildpacks...

buildpack              position   enabled   locked   filename
staticfile_buildpack   1          true      false    staticfile_buildpack-cached-v1.2.3.zip
java_buildpack         2          true      false    java-buildpack-v3.3.1.zip
ruby_buildpack         3          true      false    ruby_buildpack-cached-v1.6.11.zip
nodejs_buildpack       4          true      false    nodejs_buildpack-cached-v1.5.4.zip
go_buildpack           5          true      false    go_buildpack-cached-v1.7.1.zip
python_buildpack       6          true      false    python_buildpack-cached-v1.5.3.zip
php_buildpack          7          true      false    php_buildpack-cached-v4.3.1.zip
binary_buildpack       8          true      false    binary_buildpack-cached-v1.0.1.zip

The following push shows much more output (the name of my app is n):

$ cf push n -b nodejs_buildpack -p ~/workspace/cf-release/src/github.com/cloudfoundry/cf-acceptance-tests/assets/node/
Creating app n in org o / space s as admin...
OK

Creating route n.bosh-lite.com...
OK

Binding n.bosh-lite.com to n...
OK

Uploading n...
Uploading app files from: /Users/agupta/workspace/cf-release/src/github.com/cloudfoundry/cf-acceptance-tests/assets/node/
Uploading 741B, 2 files
Done uploading
OK

Starting app n in org o / space s as admin...
-----> Downloaded app package (4.0K)
-------> Buildpack version 1.5.4
-----> Creating runtime environment
       NPM_CONFIG_LOGLEVEL=error
       NPM_CONFIG_PRODUCTION=true
       NODE_ENV=production
       NODE_MODULES_CACHE=true
-----> Installing binaries
       engines.node (package.json):  unspecified
       engines.npm (package.json):   unspecified (use default)
       Resolving node version (latest stable) via semver.io...
       Downloading and installing node 4.2.3...
       Downloaded [file:///var/vcap/data/dea_next/admin_buildpacks/5415303c-2fcd-48c2-89f4-979c9700e78d_4c3d273d4bdf966b701b46deca622dd9925969f3/dependencies/https___pivotal-buildpacks.s3.amazonaws.com_concourse-binaries_node_node-4.2.3-linux-x64.tgz]
       Using default npm version: 2.14.7
-----> Restoring cache
       Skipping cache restore (new runtime signature)
-----> Building dependencies
       Pruning any extraneous modules
       Installing node modules (package.json)
-----> Caching build
       Clearing previous node cache
       Saving 2 cacheDirectories (default):
       - node_modules (nothing to cache)
       - bower_components (nothing to cache)
-----> Build succeeded!
       └── (empty)

-----> Uploading droplet (9.0M)

1 of 1 instances running

App started

OK

App n was started using this command `npm start`

Showing health and status for app n in org o / space s as admin...
OK

requested state: started
instances: 1/1
usage: 256M x 1 instances
urls: n.bosh-lite.com
last uploaded: Mon Jan 4 19:47:54 UTC 2016
stack: cflinuxfs2
buildpack: nodejs_buildpack

I'm not sure yet why those logs aren't showing up for you, but perhaps for simplicity you would like to open that as a separate issue.

For why it's actually not working, mesos slave is starting the executor container. Then you need to see what's happening inside the executor container. There should be several processes in there, metron_agent, consul_agent, rep, garden, and they would ideally be logging to a subdirectory of /var/vcap/sys/log. You can check there for any erroneous behaviour.

codenrhoden commented 8 years ago

Sorry to comment on an old thread, but since it's still open I figured it might be okay.

First off -- a big thanks to all the commenters in here. I hit essentially all of the same problems, and was also to get through them by walking through this. I can successfully push apps, and scale them through cf. Great!

The problem I am having, though, is when I launch the test appliation from @jianhuiz (https://github.com/jianhuiz/cf-apps), I get a 502 error code when trying to visit the URL. When I test the app with just the Diego backend, everything works, but once I switch over to Mesos, I get 502s.

I've traced through haproxy and the gorouter, and that all seems fine. I dumped the routes in the gorouter, and found then when I tried to hit the URLs I get connection failed. For example, when I dump the gorouter routes, I see this for my app (with two instances):

"hello-diego.bosh-lite.com": [
        {
            "address": "10.0.2.15:60002",
            "ttl": 0
        },
        {
            "address": "10.0.2.15:60000",
            "ttl": 0
        }
    ],

I can ping that IP just fine, but no luck with curl

root@8dc44335-f3f6-4059-a34a-f9b8da119fc9:~# curl 10.0.2.15:60002
curl: (7) Failed to connect to 10.0.2.15 port 60002: Connection refused
root@8dc44335-f3f6-4059-a34a-f9b8da119fc9:~# curl 10.0.2.15:60000
curl: (7) Failed to connect to 10.0.2.15 port 60000: Connection refused

This was done from the gorouter VM.

That IP address is the IP for the mesos-slave. where I get really confused is that when I visit the mesos slave, there is nobody listening on those TCP ports.

root@mesos-slave1:/# netstat -atln | grep 6000
root@mesos-slave1:/#

If I docker inspect the Docker container running diego-cell, there are no ports mapped to it. So how is any traffic going to get into the Docker container?

From within the Docker container, there isn't anybody listening there either. When I look at how the Garden container was launched, the config seems to be right, with flags like CF_INSTANCE_INDEX=0 --env CF_INSTANCE_IP=10.0.2.15 --env CF_INSTANCE_PORT=60000 --env CF_INSTANCE_PORTS=[{"external":60000,"internal":8080},{"external":60001,"internal":2222}]

but again, using netstat, no one is listening on port 60000. Am I missing something? When I watched @jianhuiz youtube video demo (https://youtu.be/2XZK3Mu32-s) I notice that he never visits the app in a browser. Can you comment on whether this part of things actually should be working? Otherwise, I have been able to recreate everything I've seen in the demos. Cool stuff!