mesos / mr-redis

Redis on Apache Mesos
Apache License 2.0
127 stars 32 forks source link

Wait until framework registers successfully before launching tasks. #37

Closed gdhgdhgdh closed 8 years ago

gdhgdhgdh commented 8 years ago

Hi,

On DC/OS 1.8.1-open I tried to create

curl -X POST http://mrredis.mesos:5656/v1/CREATE/redistyk/256/1/1

I0825 21:25:27.388743  4763 exec.cpp:161] Version: 1.0.0
I0825 21:25:27.403304  4769 exec.cpp:236] Executor registered on agent f17f0016-1054-4e11-8541-6088f5fc435c-S20
I0825 21:25:27.406091  4769 docker.cpp:815] Running docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e MARATHON_APP_LABEL_DCOS_PACKAGE_SOURCE=https://universe.mesosphere.com/repo -e MARATHON_APP_VERSION=2016-08-25T17:24:57.779Z -e HOST=10.102.4.151 -e MARATHON_APP_RESOURCE_CPUS=1.0 -e MARATHON_APP_LABEL_DCOS_PACKAGE_REGISTRY_VERSION=2.0 -e CLUSTER_SIZE=1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_LABEL_DCOS_PACKAGE_RELEASE=0 -e PORT_5454=28188 -e MARATHON_APP_DOCKER_IMAGE=paashuawei/mr-redis:latest -e MARATHON_APP_LABEL_DCOS_PACKAGE_NAME=mr-redis -e MARATHON_APP_LABEL_DCOS_PACKAGE_VERSION=0.0.1 -e PORT_5656=28189 -e MESOS_TASK_ID=mr-redis.d9c7ddb2-6ae8-11e6-9cf9-70b3d5800003 -e PORT=28188 -e MARATHON_APP_RESOURCE_MEM=128.0 -e PORTS=28188,28189,28190 -e PORT1=28189 -e MARATHON_APP_LABEL_DCOS_PACKAGE_IS_FRAMEWORK=true -e FRAMEWORK_NAME=mr-redis -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS=DCOS_PACKAGE_RELEASE DCOS_PACKAGE_SOURCE DCOS_PACKAGE_REGISTRY_VERSION DCOS_PACKAGE_FRAMEWORK_NAME DCOS_PACKAGE_VERSION DCOS_PACKAGE_NAME DCOS_PACKAGE_IS_FRAMEWORK -e MARATHON_APP_LABEL_DCOS_PACKAGE_FRAMEWORK_NAME=mr-redis -e MESOS_MASTER=zk://master.mesos:2181/mesos -e MARATHON_APP_ID=/mr-redis -e PORT0=28188 -e PORT_2379=28190 -e PORT2=28190 -e LIBPROCESS_IP=10.102.4.151 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-f17f0016-1054-4e11-8541-6088f5fc435c-S20.ea40a023-1f13-4b1b-beeb-0fab5e23d623 -v /var/lib/mesos/slave/slaves/f17f0016-1054-4e11-8541-6088f5fc435c-S20/frameworks/e510b87e-cd91-4c50-82e1-c3dc19ad8a43-0001/executors/mr-redis.d9c7ddb2-6ae8-11e6-9cf9-70b3d5800003/runs/ea40a023-1f13-4b1b-beeb-0fab5e23d623:/mnt/mesos/sandbox --net host --name mesos-f17f0016-1054-4e11-8541-6088f5fc435c-S20.ea40a023-1f13-4b1b-beeb-0fab5e23d623 paashuawei/mr-redis:latest
2016/08/25 17:25:28 Configuration file is = {zk://master.mesos:2181/mesos /mrredis/bin/MrRedisExecutor /mrredis/bin/redis-server etcd http://mr-redis.marathon.mesos:2379 stderr mr-redis.marathon.mesos 5454 5656}
2016/08/25 17:25:28 *****************************************************************
2016/08/25 17:25:28 *********************Starting MrRedis-Scheduler******************
2016/08/25 17:25:28 *****************************************************************
2016/08/25 17:25:28 Starting the HTTP server at port 5656
2016/08/25 17:25:28 Hosting artifact '/mrredis/bin/MrRedisExecutor' at 'http://mr-redis.marathon.mesos:5454/MrRedisExecutor'
2016/08/25 17:25:28 Hosting artifact '/mrredis/bin/redis-server' at 'http://mr-redis.marathon.mesos:5454/redis-server'
2016/08/25 17:25:28 host_ip = 10.102.4.151 going to listen and serve
2016/08/25 17:25:28 Scheduler Maintainer is starting
2016/08/25 17:25:28 Not registered previously
2016/08/25 17:25:28 The Framework ID is &FrameworkID{Value:*,XXX_unrecognized:[],} and &FrameworkID{Value:*,XXX_unrecognized:[],}
2016/08/25 17:25:28 Connected to 10.102.4.11:2181
2016/08/25 17:25:28 Authenticated: id=240574411288608991, timeout=40000
2016/08/25 17:25:28 MrRedis Registered &FrameworkID{Value:*f17f0016-1054-4e11-8541-6088f5fc435c-0002,XXX_unrecognized:[],}
2016/08/30 15:04:34 slave lost: &SlaveID{Value:*f17f0016-1054-4e11-8541-6088f5fc435c-S14,XXX_unrecognized:[],}
2016/08/30 15:24:34 slave lost: &SlaveID{Value:*f17f0016-1054-4e11-8541-6088f5fc435c-S13,XXX_unrecognized:[],}
2016/08/31 15:39:35 slave lost: &SlaveID{Value:*f17f0016-1054-4e11-8541-6088f5fc435c-S21,XXX_unrecognized:[],}
2016/08/31 15:59:35 slave lost: &SlaveID{Value:*f17f0016-1054-4e11-8541-6088f5fc435c-S22,XXX_unrecognized:[],}
2016/09/01 12:10:59 slave lost: &SlaveID{Value:*f17f0016-1054-4e11-8541-6088f5fc435c-S24,XXX_unrecognized:[],}
2016/09/01 12:30:59 slave lost: &SlaveID{Value:*f17f0016-1054-4e11-8541-6088f5fc435c-S23,XXX_unrecognized:[],}
2016/09/01 13:33:52 slave lost: &SlaveID{Value:*f17f0016-1054-4e11-8541-6088f5fc435c-S26,XXX_unrecognized:[],}
2016/09/01 13:53:52 slave lost: &SlaveID{Value:*f17f0016-1054-4e11-8541-6088f5fc435c-S25,XXX_unrecognized:[],}
2016/09/07 10:38:03 slave lost: &SlaveID{Value:*f17f0016-1054-4e11-8541-6088f5fc435c-S28,XXX_unrecognized:[],}
2016/09/07 10:58:03 slave lost: &SlaveID{Value:*f17f0016-1054-4e11-8541-6088f5fc435c-S29,XXX_unrecognized:[],}
2016/09/08 10:55:56 Instance Name=redistyk, Capacity=256, masters=1, slaves=1
2016/09/08 10:55:56 Received offer {true 0xc8206ce900 1}
2016/09/08 10:55:56 Created 1 master offers for Instance redistyk
2016/09/08 10:55:56 Received Offer with CPU=0.7 MEM=10323 OfferID=f17f0016-1054-4e11-8541-6088f5fc435c-O13360822
2016/09/08 10:55:56 Launched 0 tasks from this offer
2016/09/08 10:55:56 Received Offer with CPU=0.6 MEM=8737 OfferID=f17f0016-1054-4e11-8541-6088f5fc435c-O13360823
2016/09/08 10:55:56 Launched 0 tasks from this offer
2016/09/08 10:55:56 MrRedis Receives offer
2016/09/08 10:55:57 Received Offer with CPU=0.2 MEM=6177 OfferID=f17f0016-1054-4e11-8541-6088f5fc435c-O13360832
2016/09/08 10:55:57 Launched 0 tasks from this offer
2016/09/08 10:55:57 Received Offer with CPU=0.2 MEM=6177 OfferID=f17f0016-1054-4e11-8541-6088f5fc435c-O13360833
2016/09/08 10:55:57 Launched 0 tasks from this offer
2016/09/08 10:55:57 MrRedis Receives offer
2016/09/08 10:55:58 Received Offer with CPU=0.7 MEM=10529 OfferID=f17f0016-1054-4e11-8541-6088f5fc435c-O13360835
2016/09/08 10:55:58 Launched 0 tasks from this offer
2016/09/08 10:55:58 MrRedis Receives offer
2016/09/08 10:55:59 Received Offer with CPU=1 MEM=10785 OfferID=f17f0016-1054-4e11-8541-6088f5fc435c-O13360847
2016/09/08 10:55:59 Launched 1 tasks from this offer
2016/09/08 10:55:59 Received Offer with CPU=0 MEM=6433 OfferID=f17f0016-1054-4e11-8541-6088f5fc435c-O13360848
2016/09/08 10:55:59 Launched 0 tasks from this offer
2016/09/08 10:55:59 MrRedis Receives offer
2016/09/08 10:56:01 MrRedis Task Update received
2016/09/08 10:56:01 Status={redistyk::cdce9c26-b830-4258-6c4d-89d1789ef761 TASK_RUNNING []}
2016/09/08 10:56:01 Received a Task update from the channel &{redistyk::cdce9c26-b830-4258-6c4d-89d1789ef761 TASK_RUNNING []}
2016/09/08 10:56:01 Error occured 100: Key not found (/MrRedis/Instances/redistyk/Procs/SlaveOf) [33]
2016/09/08 10:56:01 Task redistyk::cdce9c26-b830-4258-6c4d-89d1789ef761 is Running
2016/09/08 10:56:01 Received offer {false 0xc8206ce900 1}
2016/09/08 10:56:01 Created 1 slave offers for Instance redistyk
2016/09/08 10:56:02 Received Offer with CPU=0.2 MEM=6177 OfferID=f17f0016-1054-4e11-8541-6088f5fc435c-O13360863
2016/09/08 10:56:02 Launched 0 tasks from this offer
2016/09/08 10:56:02 Received Offer with CPU=0.6 MEM=8737 OfferID=f17f0016-1054-4e11-8541-6088f5fc435c-O13360864
2016/09/08 10:56:02 Launched 0 tasks from this offer
2016/09/08 10:56:02 MrRedis Receives offer
dhilipkumars commented 8 years ago

Hi @gdhgdhgdh

Pretty strange

From the logs it appears that both the redis procs (Master and Slave) have been launched.

2016/09/08 10:55:56 Instance Name=redistyk, Capacity=256, masters=1, slaves=1
2016/09/08 10:55:56 Received offer {true 0xc8206ce900 1}
2016/09/08 10:55:56 Created 1 master offers for Instance redistyk

for slave

2016/09/08 10:55:59 Launched 1 tasks from this offer
0:56:01 Created 1 slave offers for Instance redistyk

But the task update is recived for only one. However i suspect some log messeges are missing.

Could you please upload both stderr and stdout for the entier run please?

gdhgdhgdh commented 8 years ago

Oh wow I thought I had uploaded this days ago. Clearly not.

The entire stderr:

I0914 17:48:41.614203  5743 exec.cpp:161] Version: 1.0.1
I0914 17:48:41.620517  5754 exec.cpp:236] Executor registered on agent 7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9
I0914 17:48:41.621717  5753 docker.cpp:815] Running docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e MARATHON_APP_LABEL_DCOS_PACKAGE_SOURCE=https://universe.mesosphere.com/repo -e MARATHON_APP_VERSION=2016-09-14T13:48:16.739Z -e HOST=10.101.4.56 -e MARATHON_APP_RESOURCE_CPUS=1.0 -e MARATHON_APP_LABEL_DCOS_PACKAGE_REGISTRY_VERSION=2.0 -e CLUSTER_SIZE=1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_LABEL_DCOS_PACKAGE_RELEASE=0 -e PORT_5454=16130 -e MARATHON_APP_DOCKER_IMAGE=paashuawei/mr-redis:latest -e MARATHON_APP_LABEL_DCOS_PACKAGE_NAME=mr-redis -e MARATHON_APP_LABEL_DCOS_PACKAGE_VERSION=0.0.1 -e PORT_5656=16131 -e MESOS_TASK_ID=mr-redis.e5648a2f-7a81-11e6-b9e0-70b3d5800009 -e PORT=16130 -e MARATHON_APP_RESOURCE_MEM=128.0 -e PORTS=16130,16131,16132 -e PORT1=16131 -e MARATHON_APP_LABEL_DCOS_PACKAGE_IS_FRAMEWORK=true -e FRAMEWORK_NAME=mr-redis -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS=DCOS_PACKAGE_RELEASE DCOS_PACKAGE_SOURCE DCOS_PACKAGE_REGISTRY_VERSION DCOS_PACKAGE_FRAMEWORK_NAME DCOS_PACKAGE_VERSION DCOS_PACKAGE_NAME DCOS_PACKAGE_IS_FRAMEWORK -e MARATHON_APP_LABEL_DCOS_PACKAGE_FRAMEWORK_NAME=mr-redis -e MESOS_MASTER=zk://master.mesos:2181/mesos -e MARATHON_APP_ID=/mr-redis -e PORT0=16130 -e PORT_2379=16132 -e PORT2=16132 -e LIBPROCESS_IP=10.101.4.56 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9.b14f22b5-f8c3-479a-b70b-7ef671f71821 -v /var/lib/mesos/slave/slaves/7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9/frameworks/7efb2926-f2cf-4ce5-be67-94ce5575da3d-0000/executors/mr-redis.e5648a2f-7a81-11e6-b9e0-70b3d5800009/runs/b14f22b5-f8c3-479a-b70b-7ef671f71821:/mnt/mesos/sandbox --net host --name mesos-7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9.b14f22b5-f8c3-479a-b70b-7ef671f71821 paashuawei/mr-redis:latest
2016/09/14 13:48:42 Configuration file is = {zk://master.mesos:2181/mesos /mrredis/bin/MrRedisExecutor /mrredis/bin/redis-server etcd http://mr-redis.marathon.mesos:2379 stderr mr-redis.marathon.mesos 5454 5656}
2016/09/14 13:48:42 *****************************************************************
2016/09/14 13:48:42 *********************Starting MrRedis-Scheduler******************
2016/09/14 13:48:42 *****************************************************************
2016/09/14 13:48:42 Starting the HTTP server at port 5656
2016/09/14 13:48:42 Hosting artifact '/mrredis/bin/MrRedisExecutor' at 'http://mr-redis.marathon.mesos:5454/MrRedisExecutor'
2016/09/14 13:48:42 Hosting artifact '/mrredis/bin/redis-server' at 'http://mr-redis.marathon.mesos:5454/redis-server'
2016/09/14 13:48:42 host_ip = 10.101.4.56 going to listen and serve
2016/09/14 13:48:42 Scheduler Maintainer is starting
2016/09/14 13:48:42 Not registered previously
2016/09/14 13:48:42 The Framework ID is &FrameworkID{Value:*,XXX_unrecognized:[],} and &FrameworkID{Value:*,XXX_unrecognized:[],}
2016/09/14 13:48:42 Connected to 10.101.4.11:2181
2016/09/14 13:48:42 Authenticated: id=240698803047761221, timeout=40000
2016/09/14 13:48:50 Instance Name=redistyk, Capacity=256, masters=1, slaves=3
2016/09/14 13:48:50 Received offer {true 0xc82011a2d0 1}
2016/09/14 13:48:50 Created 1 master offers for Instance redistyk
2016/09/14 13:48:52 MrRedis Registered &FrameworkID{Value:*7efb2926-f2cf-4ce5-be67-94ce5575da3d-0005,XXX_unrecognized:[],}
2016/09/14 13:48:52 Received Offer with CPU=2.2 MEM=6926 OfferID=7efb2926-f2cf-4ce5-be67-94ce5575da3d-O594923
2016/09/14 13:48:52 Launched 1 tasks from this offer
2016/09/14 13:48:52 Received Offer with CPU=3.1 MEM=9486 OfferID=7efb2926-f2cf-4ce5-be67-94ce5575da3d-O594924
2016/09/14 13:48:52 Launched 0 tasks from this offer
2016/09/14 13:48:52 Received Offer with CPU=2.7 MEM=9230 OfferID=7efb2926-f2cf-4ce5-be67-94ce5575da3d-O594925
2016/09/14 13:48:52 Launched 0 tasks from this offer
2016/09/14 13:48:52 Received Offer with CPU=0.4 MEM=5262 OfferID=7efb2926-f2cf-4ce5-be67-94ce5575da3d-O594926
2016/09/14 13:48:52 Launched 0 tasks from this offer
2016/09/14 13:48:52 Received Offer with CPU=0.1 MEM=4416 OfferID=7efb2926-f2cf-4ce5-be67-94ce5575da3d-O594927
2016/09/14 13:48:52 Launched 0 tasks from this offer
2016/09/14 13:48:52 Received Offer with CPU=3.7 MEM=12302 OfferID=7efb2926-f2cf-4ce5-be67-94ce5575da3d-O594928
2016/09/14 13:48:52 Launched 0 tasks from this offer
2016/09/14 13:48:52 MrRedis Receives offer
2016/09/14 13:48:53 executor "&ExecutorID{Value:*default,XXX_unrecognized:[],}" lost on slave "&SlaveID{Value:*7efb2926-f2cf-4ce5-be67-94ce5575da3d-S11,XXX_unrecognized:[],}" code 256
2016/09/14 13:48:53 MrRedis Task Update received
2016/09/14 13:48:53 Status={redistyk::4fc70193-e36c-46e0-42ae-407103c86915 TASK_FAILED []}
2016/09/14 13:48:53 Received a Task update from the channel &{redistyk::4fc70193-e36c-46e0-42ae-407103c86915 TASK_FAILED []}
2016/09/14 13:48:53 Invalid Key /MrRedis/Instances/redistyk/Procs/4fc70193-e36c-46e0-42ae-407103c86915, Cannot load
2016/09/14 13:48:53 Error occured 100: Key not found (/MrRedis/Instances/redistyk/Procs/SlaveOf) [18]
2016/09/14 13:48:53 Task redistyk::4fc70193-e36c-46e0-42ae-407103c86915 is Failed

The entire stdout:

--container="mesos-7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9.b14f22b5-f8c3-479a-b70b-7ef671f71821" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --launcher_dir="/opt/mesosphere/packages/mesos--706ed7de184c662ca909c7f55479427b9e0a338e/libexec/mesos" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/var/lib/mesos/slave/slaves/7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9/frameworks/7efb2926-f2cf-4ce5-be67-94ce5575da3d-0000/executors/mr-redis.e5648a2f-7a81-11e6-b9e0-70b3d5800009/runs/b14f22b5-f8c3-479a-b70b-7ef671f71821" --stop_timeout="20secs" --task_environment="{}"
--container="mesos-7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9.b14f22b5-f8c3-479a-b70b-7ef671f71821" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --launcher_dir="/opt/mesosphere/packages/mesos--706ed7de184c662ca909c7f55479427b9e0a338e/libexec/mesos" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/var/lib/mesos/slave/slaves/7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9/frameworks/7efb2926-f2cf-4ce5-be67-94ce5575da3d-0000/executors/mr-redis.e5648a2f-7a81-11e6-b9e0-70b3d5800009/runs/b14f22b5-f8c3-479a-b70b-7ef671f71821" --stop_timeout="20secs" --task_environment="{}"
Registered docker executor on 10.101.4.56
Starting task mr-redis.e5648a2f-7a81-11e6-b9e0-70b3d5800009
***************************************************************************************
BASH=/bin/bash
BASHOPTS=cmdhist:complete_fullquote:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
BASH_ALIASES=()
BASH_ARGC=()
BASH_ARGV=()
BASH_CMDS=()
BASH_LINENO=([0]="0")
BASH_SOURCE=([0]="/etcd/bin/runetcd.sh")
BASH_VERSINFO=([0]="4" [1]="3" [2]="11" [3]="1" [4]="release" [5]="x86_64-pc-linux-gnu")
BASH_VERSION='4.3.11(1)-release'
CLUSTER_SIZE=1
DIRSTACK=()
EUID=0
FRAMEWORK_NAME=mr-redis
GROUPS=()
HOME=/root
HOST=10.101.4.56
HOSTNAME=ip-10-101-4-56
HOSTTYPE=x86_64
IFS=$' \t\n'
LIBPROCESS_IP=10.101.4.56
MACHTYPE=x86_64-pc-linux-gnu
MARATHON_APP_DOCKER_IMAGE=paashuawei/mr-redis:latest
MARATHON_APP_ID=/mr-redis
MARATHON_APP_LABELS='DCOS_PACKAGE_RELEASE DCOS_PACKAGE_SOURCE DCOS_PACKAGE_REGISTRY_VERSION DCOS_PACKAGE_FRAMEWORK_NAME DCOS_PACKAGE_VERSION DCOS_PACKAGE_NAME DCOS_PACKAGE_IS_FRAMEWORK'
MARATHON_APP_LABEL_DCOS_PACKAGE_FRAMEWORK_NAME=mr-redis
MARATHON_APP_LABEL_DCOS_PACKAGE_IS_FRAMEWORK=true
MARATHON_APP_LABEL_DCOS_PACKAGE_NAME=mr-redis
MARATHON_APP_LABEL_DCOS_PACKAGE_REGISTRY_VERSION=2.0
MARATHON_APP_LABEL_DCOS_PACKAGE_RELEASE=0
MARATHON_APP_LABEL_DCOS_PACKAGE_SOURCE=https://universe.mesosphere.com/repo
MARATHON_APP_LABEL_DCOS_PACKAGE_VERSION=0.0.1
MARATHON_APP_RESOURCE_CPUS=1.0
MARATHON_APP_RESOURCE_DISK=0.0
MARATHON_APP_RESOURCE_GPUS=0
MARATHON_APP_RESOURCE_MEM=128.0
MARATHON_APP_VERSION=2016-09-14T13:48:16.739Z
MESOS_CONTAINER_NAME=mesos-7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9.b14f22b5-f8c3-479a-b70b-7ef671f71821
MESOS_MASTER=zk://master.mesos:2181/mesos
MESOS_SANDBOX=/mnt/mesos/sandbox
MESOS_TASK_ID=mr-redis.e5648a2f-7a81-11e6-b9e0-70b3d5800009
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PIPESTATUS=([0]="0")
PORT=16130
PORT0=16130
PORT1=16131
PORT2=16132
PORTS=16130,16131,16132
PORT_2379=16132
PORT_5454=16130
PORT_5656=16131
PPID=1
PS4='+ '
PWD=/
SHELL=/bin/bash
SHELLOPTS=braceexpand:hashall:interactive-comments
SHLVL=2
TERM=dumb
UID=0
_='***************************************************************************************'
***************************************************************************************
2016-09-14 13:48:41.703078 I | etcdmain: etcd Version: 2.2.2
2016-09-14 13:48:41.703137 I | etcdmain: Git SHA: b4bddf6
2016-09-14 13:48:41.703144 I | etcdmain: Go Version: go1.5.1
2016-09-14 13:48:41.703148 I | etcdmain: Go OS/Arch: linux/amd64
2016-09-14 13:48:41.703154 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2016-09-14 13:48:41.703159 W | etcdmain: no data-dir provided, using default data-dir ./default.etcd
2016-09-14 13:48:41.703591 I | etcdmain: listening for peers on http://localhost:2380
2016-09-14 13:48:41.703780 I | etcdmain: listening for peers on http://localhost:7001
2016-09-14 13:48:41.703829 I | etcdmain: listening for client requests on http://10.101.4.56:2379
2016-09-14 13:48:41.703912 I | etcdmain: listening for client requests on http://10.101.4.56:4001
*** etcd started ***
Starting the MrRedis scheduler
2016/09/14 13:48:42 [config.go:314] [W] open /conf/app.conf: no such file or directory 
2016/09/14 13:48:42 [asm_amd64.s:1721] [I] http server Running on :5656
2016/09/14 13:48:46 [router.go:854] [D] | GET        | /v1/STATUS                               | 135.764µs        | match      | /v1/STATUS/                              | 
2016/09/14 13:48:50 [router.go:854] [D] | POST       | /v1/CREATE/redistyk/256/1/3              | 14.632778ms      | match      | /v1/CREATE/:INSTANCENAME/:CAPACITY/:MASTERS/:SLAVES | 
2016/09/14 13:49:03 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 193.661µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:06 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 127.008µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:08 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 141.084µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:09 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 115.661µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:10 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 112.231µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:12 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 114.39µs         | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:14 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 117.521µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:16 [router.go:854] [D] | GET        | /v1/STATUS                               | 90.286µs         | match      | /v1/STATUS/                              | 
2016/09/14 13:49:28 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 122.196µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:38 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 162.556µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:39 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 223.718µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:39 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 100.863µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:40 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 107.98µs         | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:42 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 156.305µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:42 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 219.395µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:43 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 150.573µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:45 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 143.511µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:46 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 173.448µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:46 [router.go:854] [D] | GET        | /v1/STATUS                               | 90.393µs         | match      | /v1/STATUS/                              | 
2016/09/14 13:49:47 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 111.41µs         | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:50 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 143.052µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:51 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 117.69µs         | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:52 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 109.352µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:53 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 106.348µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:53 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 169.686µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:54 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 123.056µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:49:55 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 97.356µs         | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:50:08 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 106.388µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:50:16 [router.go:854] [D] | GET        | /v1/STATUS                               | 84.314µs         | match      | /v1/STATUS/                              | 
2016/09/14 13:50:25 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 138.81µs         | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:50:36 [router.go:854] [D] | GET        | /v1/STATUS/redistyk                      | 115.086µs        | match      | /v1/STATUS/:INSTANCENAME                 | 
2016/09/14 13:50:46 [router.go:854] [D] | GET        | /v1/STATUS                               | 101.276µs        | match      | /v1/STATUS/                              | 
2016/09/14 13:51:16 [router.go:854] [D] | GET        | /v1/STATUS                               | 139.91µs         | match      | /v1/STATUS/                              | 
2016/09/14 13:51:47 [router.go:854] [D] | GET        | /v1/STATUS                               | 141.954µs        | match      | /v1/STATUS/                              | 
2016/09/14 13:52:17 [router.go:854] [D] | GET        | /v1/STATUS                               | 124.65µs         | match      | /v1/STATUS/                              | 
gdhgdhgdh commented 8 years ago

I looked at the above situation a little more closely, and the reason the first redistyk task failed was because it could not resolve mrredis.mesos.

My deployment script does dcos package install mr-redis and then loops until curl --fail -X POST http://mrredis.mesos:5656/CREATE/...... completes.

It strikes me that the mrredis.mesos may take longer to replicate to the agents, and hence is not available when the first task is launched. Does that sound like a reasonable assertion?

In this case, perhaps the framework just needs to retry (with exponential backoff of course...) until the task stays running ?

dhilipkumars commented 8 years ago

Probably the bug here is a race condition, we should never start listening for offers until we register with the master successfully.

2016/09/14 13:48:50 Created 1 master offers for Instance redistyk
2016/09/14 13:48:52 MrRedis Registered &FrameworkID{Value:*7efb2926-f2cf-4ce5-be67-94ce5575da3d-0005,XXX_unrecognized:[],}

Thanks for reporting. Will add this to the milestone too.

sunhongtao commented 8 years ago

@gdhgdhgdh hi, Can you run normally with mrredis on mesos under the zookeeper?

gdhgdhgdh commented 8 years ago

I have only been using DC/OS - I have never used Mesos 'standalone' so I have no experience here. What would running with 'pure' Mesos / ZK prove?

dhilipkumars commented 8 years ago

Addressed by https://github.com/mesos/mr-redis/commit/be30bef6cce31e8f12104b1b869458c4fbd4968a