Closed gdhgdhgdh closed 8 years ago
Hi @gdhgdhgdh
Pretty strange
From the logs it appears that both the redis procs (Master and Slave) have been launched.
2016/09/08 10:55:56 Instance Name=redistyk, Capacity=256, masters=1, slaves=1
2016/09/08 10:55:56 Received offer {true 0xc8206ce900 1}
2016/09/08 10:55:56 Created 1 master offers for Instance redistyk
for slave
2016/09/08 10:55:59 Launched 1 tasks from this offer
0:56:01 Created 1 slave offers for Instance redistyk
But the task update is recived for only one. However i suspect some log messeges are missing.
Could you please upload both stderr and stdout for the entier run please?
Oh wow I thought I had uploaded this days ago. Clearly not.
The entire stderr:
I0914 17:48:41.614203 5743 exec.cpp:161] Version: 1.0.1
I0914 17:48:41.620517 5754 exec.cpp:236] Executor registered on agent 7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9
I0914 17:48:41.621717 5753 docker.cpp:815] Running docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e MARATHON_APP_LABEL_DCOS_PACKAGE_SOURCE=https://universe.mesosphere.com/repo -e MARATHON_APP_VERSION=2016-09-14T13:48:16.739Z -e HOST=10.101.4.56 -e MARATHON_APP_RESOURCE_CPUS=1.0 -e MARATHON_APP_LABEL_DCOS_PACKAGE_REGISTRY_VERSION=2.0 -e CLUSTER_SIZE=1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_LABEL_DCOS_PACKAGE_RELEASE=0 -e PORT_5454=16130 -e MARATHON_APP_DOCKER_IMAGE=paashuawei/mr-redis:latest -e MARATHON_APP_LABEL_DCOS_PACKAGE_NAME=mr-redis -e MARATHON_APP_LABEL_DCOS_PACKAGE_VERSION=0.0.1 -e PORT_5656=16131 -e MESOS_TASK_ID=mr-redis.e5648a2f-7a81-11e6-b9e0-70b3d5800009 -e PORT=16130 -e MARATHON_APP_RESOURCE_MEM=128.0 -e PORTS=16130,16131,16132 -e PORT1=16131 -e MARATHON_APP_LABEL_DCOS_PACKAGE_IS_FRAMEWORK=true -e FRAMEWORK_NAME=mr-redis -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS=DCOS_PACKAGE_RELEASE DCOS_PACKAGE_SOURCE DCOS_PACKAGE_REGISTRY_VERSION DCOS_PACKAGE_FRAMEWORK_NAME DCOS_PACKAGE_VERSION DCOS_PACKAGE_NAME DCOS_PACKAGE_IS_FRAMEWORK -e MARATHON_APP_LABEL_DCOS_PACKAGE_FRAMEWORK_NAME=mr-redis -e MESOS_MASTER=zk://master.mesos:2181/mesos -e MARATHON_APP_ID=/mr-redis -e PORT0=16130 -e PORT_2379=16132 -e PORT2=16132 -e LIBPROCESS_IP=10.101.4.56 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9.b14f22b5-f8c3-479a-b70b-7ef671f71821 -v /var/lib/mesos/slave/slaves/7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9/frameworks/7efb2926-f2cf-4ce5-be67-94ce5575da3d-0000/executors/mr-redis.e5648a2f-7a81-11e6-b9e0-70b3d5800009/runs/b14f22b5-f8c3-479a-b70b-7ef671f71821:/mnt/mesos/sandbox --net host --name mesos-7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9.b14f22b5-f8c3-479a-b70b-7ef671f71821 paashuawei/mr-redis:latest
2016/09/14 13:48:42 Configuration file is = {zk://master.mesos:2181/mesos /mrredis/bin/MrRedisExecutor /mrredis/bin/redis-server etcd http://mr-redis.marathon.mesos:2379 stderr mr-redis.marathon.mesos 5454 5656}
2016/09/14 13:48:42 *****************************************************************
2016/09/14 13:48:42 *********************Starting MrRedis-Scheduler******************
2016/09/14 13:48:42 *****************************************************************
2016/09/14 13:48:42 Starting the HTTP server at port 5656
2016/09/14 13:48:42 Hosting artifact '/mrredis/bin/MrRedisExecutor' at 'http://mr-redis.marathon.mesos:5454/MrRedisExecutor'
2016/09/14 13:48:42 Hosting artifact '/mrredis/bin/redis-server' at 'http://mr-redis.marathon.mesos:5454/redis-server'
2016/09/14 13:48:42 host_ip = 10.101.4.56 going to listen and serve
2016/09/14 13:48:42 Scheduler Maintainer is starting
2016/09/14 13:48:42 Not registered previously
2016/09/14 13:48:42 The Framework ID is &FrameworkID{Value:*,XXX_unrecognized:[],} and &FrameworkID{Value:*,XXX_unrecognized:[],}
2016/09/14 13:48:42 Connected to 10.101.4.11:2181
2016/09/14 13:48:42 Authenticated: id=240698803047761221, timeout=40000
2016/09/14 13:48:50 Instance Name=redistyk, Capacity=256, masters=1, slaves=3
2016/09/14 13:48:50 Received offer {true 0xc82011a2d0 1}
2016/09/14 13:48:50 Created 1 master offers for Instance redistyk
2016/09/14 13:48:52 MrRedis Registered &FrameworkID{Value:*7efb2926-f2cf-4ce5-be67-94ce5575da3d-0005,XXX_unrecognized:[],}
2016/09/14 13:48:52 Received Offer with CPU=2.2 MEM=6926 OfferID=7efb2926-f2cf-4ce5-be67-94ce5575da3d-O594923
2016/09/14 13:48:52 Launched 1 tasks from this offer
2016/09/14 13:48:52 Received Offer with CPU=3.1 MEM=9486 OfferID=7efb2926-f2cf-4ce5-be67-94ce5575da3d-O594924
2016/09/14 13:48:52 Launched 0 tasks from this offer
2016/09/14 13:48:52 Received Offer with CPU=2.7 MEM=9230 OfferID=7efb2926-f2cf-4ce5-be67-94ce5575da3d-O594925
2016/09/14 13:48:52 Launched 0 tasks from this offer
2016/09/14 13:48:52 Received Offer with CPU=0.4 MEM=5262 OfferID=7efb2926-f2cf-4ce5-be67-94ce5575da3d-O594926
2016/09/14 13:48:52 Launched 0 tasks from this offer
2016/09/14 13:48:52 Received Offer with CPU=0.1 MEM=4416 OfferID=7efb2926-f2cf-4ce5-be67-94ce5575da3d-O594927
2016/09/14 13:48:52 Launched 0 tasks from this offer
2016/09/14 13:48:52 Received Offer with CPU=3.7 MEM=12302 OfferID=7efb2926-f2cf-4ce5-be67-94ce5575da3d-O594928
2016/09/14 13:48:52 Launched 0 tasks from this offer
2016/09/14 13:48:52 MrRedis Receives offer
2016/09/14 13:48:53 executor "&ExecutorID{Value:*default,XXX_unrecognized:[],}" lost on slave "&SlaveID{Value:*7efb2926-f2cf-4ce5-be67-94ce5575da3d-S11,XXX_unrecognized:[],}" code 256
2016/09/14 13:48:53 MrRedis Task Update received
2016/09/14 13:48:53 Status={redistyk::4fc70193-e36c-46e0-42ae-407103c86915 TASK_FAILED []}
2016/09/14 13:48:53 Received a Task update from the channel &{redistyk::4fc70193-e36c-46e0-42ae-407103c86915 TASK_FAILED []}
2016/09/14 13:48:53 Invalid Key /MrRedis/Instances/redistyk/Procs/4fc70193-e36c-46e0-42ae-407103c86915, Cannot load
2016/09/14 13:48:53 Error occured 100: Key not found (/MrRedis/Instances/redistyk/Procs/SlaveOf) [18]
2016/09/14 13:48:53 Task redistyk::4fc70193-e36c-46e0-42ae-407103c86915 is Failed
The entire stdout:
--container="mesos-7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9.b14f22b5-f8c3-479a-b70b-7ef671f71821" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --launcher_dir="/opt/mesosphere/packages/mesos--706ed7de184c662ca909c7f55479427b9e0a338e/libexec/mesos" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/var/lib/mesos/slave/slaves/7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9/frameworks/7efb2926-f2cf-4ce5-be67-94ce5575da3d-0000/executors/mr-redis.e5648a2f-7a81-11e6-b9e0-70b3d5800009/runs/b14f22b5-f8c3-479a-b70b-7ef671f71821" --stop_timeout="20secs" --task_environment="{}"
--container="mesos-7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9.b14f22b5-f8c3-479a-b70b-7ef671f71821" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --launcher_dir="/opt/mesosphere/packages/mesos--706ed7de184c662ca909c7f55479427b9e0a338e/libexec/mesos" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/var/lib/mesos/slave/slaves/7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9/frameworks/7efb2926-f2cf-4ce5-be67-94ce5575da3d-0000/executors/mr-redis.e5648a2f-7a81-11e6-b9e0-70b3d5800009/runs/b14f22b5-f8c3-479a-b70b-7ef671f71821" --stop_timeout="20secs" --task_environment="{}"
Registered docker executor on 10.101.4.56
Starting task mr-redis.e5648a2f-7a81-11e6-b9e0-70b3d5800009
***************************************************************************************
BASH=/bin/bash
BASHOPTS=cmdhist:complete_fullquote:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
BASH_ALIASES=()
BASH_ARGC=()
BASH_ARGV=()
BASH_CMDS=()
BASH_LINENO=([0]="0")
BASH_SOURCE=([0]="/etcd/bin/runetcd.sh")
BASH_VERSINFO=([0]="4" [1]="3" [2]="11" [3]="1" [4]="release" [5]="x86_64-pc-linux-gnu")
BASH_VERSION='4.3.11(1)-release'
CLUSTER_SIZE=1
DIRSTACK=()
EUID=0
FRAMEWORK_NAME=mr-redis
GROUPS=()
HOME=/root
HOST=10.101.4.56
HOSTNAME=ip-10-101-4-56
HOSTTYPE=x86_64
IFS=$' \t\n'
LIBPROCESS_IP=10.101.4.56
MACHTYPE=x86_64-pc-linux-gnu
MARATHON_APP_DOCKER_IMAGE=paashuawei/mr-redis:latest
MARATHON_APP_ID=/mr-redis
MARATHON_APP_LABELS='DCOS_PACKAGE_RELEASE DCOS_PACKAGE_SOURCE DCOS_PACKAGE_REGISTRY_VERSION DCOS_PACKAGE_FRAMEWORK_NAME DCOS_PACKAGE_VERSION DCOS_PACKAGE_NAME DCOS_PACKAGE_IS_FRAMEWORK'
MARATHON_APP_LABEL_DCOS_PACKAGE_FRAMEWORK_NAME=mr-redis
MARATHON_APP_LABEL_DCOS_PACKAGE_IS_FRAMEWORK=true
MARATHON_APP_LABEL_DCOS_PACKAGE_NAME=mr-redis
MARATHON_APP_LABEL_DCOS_PACKAGE_REGISTRY_VERSION=2.0
MARATHON_APP_LABEL_DCOS_PACKAGE_RELEASE=0
MARATHON_APP_LABEL_DCOS_PACKAGE_SOURCE=https://universe.mesosphere.com/repo
MARATHON_APP_LABEL_DCOS_PACKAGE_VERSION=0.0.1
MARATHON_APP_RESOURCE_CPUS=1.0
MARATHON_APP_RESOURCE_DISK=0.0
MARATHON_APP_RESOURCE_GPUS=0
MARATHON_APP_RESOURCE_MEM=128.0
MARATHON_APP_VERSION=2016-09-14T13:48:16.739Z
MESOS_CONTAINER_NAME=mesos-7efb2926-f2cf-4ce5-be67-94ce5575da3d-S9.b14f22b5-f8c3-479a-b70b-7ef671f71821
MESOS_MASTER=zk://master.mesos:2181/mesos
MESOS_SANDBOX=/mnt/mesos/sandbox
MESOS_TASK_ID=mr-redis.e5648a2f-7a81-11e6-b9e0-70b3d5800009
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PIPESTATUS=([0]="0")
PORT=16130
PORT0=16130
PORT1=16131
PORT2=16132
PORTS=16130,16131,16132
PORT_2379=16132
PORT_5454=16130
PORT_5656=16131
PPID=1
PS4='+ '
PWD=/
SHELL=/bin/bash
SHELLOPTS=braceexpand:hashall:interactive-comments
SHLVL=2
TERM=dumb
UID=0
_='***************************************************************************************'
***************************************************************************************
2016-09-14 13:48:41.703078 I | etcdmain: etcd Version: 2.2.2
2016-09-14 13:48:41.703137 I | etcdmain: Git SHA: b4bddf6
2016-09-14 13:48:41.703144 I | etcdmain: Go Version: go1.5.1
2016-09-14 13:48:41.703148 I | etcdmain: Go OS/Arch: linux/amd64
2016-09-14 13:48:41.703154 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2016-09-14 13:48:41.703159 W | etcdmain: no data-dir provided, using default data-dir ./default.etcd
2016-09-14 13:48:41.703591 I | etcdmain: listening for peers on http://localhost:2380
2016-09-14 13:48:41.703780 I | etcdmain: listening for peers on http://localhost:7001
2016-09-14 13:48:41.703829 I | etcdmain: listening for client requests on http://10.101.4.56:2379
2016-09-14 13:48:41.703912 I | etcdmain: listening for client requests on http://10.101.4.56:4001
*** etcd started ***
Starting the MrRedis scheduler
2016/09/14 13:48:42 [1;33m[config.go:314] [W] open /conf/app.conf: no such file or directory [0m
2016/09/14 13:48:42 [1;34m[asm_amd64.s:1721] [I] http server Running on :5656[0m
2016/09/14 13:48:46 [1;34m[router.go:854] [D] | GET | /v1/STATUS | 135.764µs | match | /v1/STATUS/ | [0m
2016/09/14 13:48:50 [1;34m[router.go:854] [D] | POST | /v1/CREATE/redistyk/256/1/3 | 14.632778ms | match | /v1/CREATE/:INSTANCENAME/:CAPACITY/:MASTERS/:SLAVES | [0m
2016/09/14 13:49:03 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 193.661µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:06 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 127.008µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:08 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 141.084µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:09 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 115.661µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:10 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 112.231µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:12 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 114.39µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:14 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 117.521µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:16 [1;34m[router.go:854] [D] | GET | /v1/STATUS | 90.286µs | match | /v1/STATUS/ | [0m
2016/09/14 13:49:28 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 122.196µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:38 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 162.556µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:39 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 223.718µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:39 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 100.863µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:40 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 107.98µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:42 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 156.305µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:42 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 219.395µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:43 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 150.573µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:45 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 143.511µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:46 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 173.448µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:46 [1;34m[router.go:854] [D] | GET | /v1/STATUS | 90.393µs | match | /v1/STATUS/ | [0m
2016/09/14 13:49:47 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 111.41µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:50 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 143.052µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:51 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 117.69µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:52 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 109.352µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:53 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 106.348µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:53 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 169.686µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:54 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 123.056µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:49:55 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 97.356µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:50:08 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 106.388µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:50:16 [1;34m[router.go:854] [D] | GET | /v1/STATUS | 84.314µs | match | /v1/STATUS/ | [0m
2016/09/14 13:50:25 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 138.81µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:50:36 [1;34m[router.go:854] [D] | GET | /v1/STATUS/redistyk | 115.086µs | match | /v1/STATUS/:INSTANCENAME | [0m
2016/09/14 13:50:46 [1;34m[router.go:854] [D] | GET | /v1/STATUS | 101.276µs | match | /v1/STATUS/ | [0m
2016/09/14 13:51:16 [1;34m[router.go:854] [D] | GET | /v1/STATUS | 139.91µs | match | /v1/STATUS/ | [0m
2016/09/14 13:51:47 [1;34m[router.go:854] [D] | GET | /v1/STATUS | 141.954µs | match | /v1/STATUS/ | [0m
2016/09/14 13:52:17 [1;34m[router.go:854] [D] | GET | /v1/STATUS | 124.65µs | match | /v1/STATUS/ | [0m
I looked at the above situation a little more closely, and the reason the first redistyk
task failed was because it could not resolve mrredis.mesos
.
My deployment script does dcos package install mr-redis
and then loops until curl --fail -X POST http://mrredis.mesos:5656/CREATE/......
completes.
It strikes me that the mrredis.mesos
may take longer to replicate to the agents, and hence is not available when the first task is launched. Does that sound like a reasonable assertion?
In this case, perhaps the framework just needs to retry (with exponential backoff of course...) until the task stays running ?
Probably the bug here is a race condition, we should never start listening for offers until we register with the master successfully.
2016/09/14 13:48:50 Created 1 master offers for Instance redistyk
2016/09/14 13:48:52 MrRedis Registered &FrameworkID{Value:*7efb2926-f2cf-4ce5-be67-94ce5575da3d-0005,XXX_unrecognized:[],}
Thanks for reporting. Will add this to the milestone too.
@gdhgdhgdh hi, Can you run normally with mrredis on mesos under the zookeeper?
I have only been using DC/OS - I have never used Mesos 'standalone' so I have no experience here. What would running with 'pure' Mesos / ZK prove?
Hi,
On DC/OS 1.8.1-open I tried to create
curl -X POST http://mrredis.mesos:5656/v1/CREATE/redistyk/256/1/1