Closed cookandy closed 8 years ago
It depends what you're doing. The idea is that the framework id is stored in zookeeper. At the next startup, the framework checks zookeeper to see if there is an id for the given cluster name. If there is, reuse that. But there is code in there to shutdown the whole cluster if you tell it to shut down. So yes, if you tell it to shut down, then it will shut down and then register a new framework id (this is required by mesos).
With that said, you would only get data loss on a cluster if ALL nodes were killed at the same time. The data is set to be fully replicated by default, so it can suffer losing all nodes except one, and still have all the data. So there is something wrong there too.
Also, data loss may always occur if you're not using some underlying long term storage mechanism. There is an option for external volumes. Alternatively you enact procedures to protect yourself in the unlikely event that all ES nodes go down at the same time by taking snapshots.
I'm confident that this works. So I can only recommend that something is wrong with your setup. Check the logs, is there a valid connection to zookeeper? Do the nodes die (they shouldn't)? Are you using some funky config? E.g. are the cluster names changing each time?
Hi @philwinder - many thanks for the reply.
I went through and looked at the logs and couldn't find anything that stuck out. To give you some background, I'm running 3 masters and 3 slaves. Here's my deployment config. You can see that I'm using --dataDir
in hopes of persisting the data outside of the container...
{
"id": "elasticsearch",
"container": {
"docker": {
"image": "mesos/elasticsearch-scheduler",
"network": "HOST"
},
"volumes": [
{
"containerPath": "/data/elasticsearch",
"hostPath": "/data/elasticsearch",
"mode": "RW"
}
]
},
"args": ["--zookeeperMesosUrl", "zk://10.138.192.237:2181,10.138.160.181:2181,10.138.160.182:2181/mesos", "--useIpAddress", "true", "--elasticsearchDockerImage", "elasticsearch-analysis-icu:latest", "--elasticsearchCpu", "2.0", "--elasticsearchRam", "2048", "--dataDir", "/data/elasticsearch", "--elasticsearchDisk", "20480" ],
"cpus": 0.2,
"mem": 512,
"env": {
"JAVA_OPTS": "-Xms128m -Xmx256m"
},
"instances": 1
}
Within 20-30 seconds of starting the marathon task, I can see the framework get registered with an ID of 84a2076e-1bf8-47ba-9b78-d96d06782851-0002
, running on SLAVE 1 (shouldn't frameworks be running on the Master?).
Additionally, I can see the following directories get created on my slaves:
Slave 1: /data/elasticsearch/mesos-ha/84a2076e-1bf8-47ba-9b78-d96d06782851-S1
Slave 2: /data/elasticsearch/mesos-ha/84a2076e-1bf8-47ba-9b78-d96d06782851-S2
Slave 3: /data/elasticsearch/mesos-ha/84a2076e-1bf8-47ba-9b78-d96d06782851-S0
I also checked the logs and it appears everything started correctly. MASTER 3 LOGS:
marathon stdout | [2016-09-21 15:02:20,503] INFO 10.138.64.115 - - [21/Sep/2016:15:02:20 +0000] "GET //10.138.160.182:8080/v2/apps//elasticsearch?embed=app.taskStats&embed=app.readiness HTTP/1.1" 404 49 "http://10.138.160.182:8080/ui/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12) AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0 Safari/602.1.50" (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$be0d4a40:qtp747152360-36)
marathon stdout | [2016-09-21 15:02:39,766] INFO [/elasticsearch]: new app detected (mesosphere.marathon.upgrade.GroupVersioningUtil$:ForkJoinPool-2-worker-11)
* Start(App(/elasticsearch, image="mesos/elasticsearch-scheduler", args="--zookeeperMesosUrl zk://10.138.192.237:2181,10.138.160.181:2181,10.138.160.182:2181/mesos --useIpAddress true --elasticsearchDockerImage elasticsearch-analysis-icu:latest --elasticsearchCpu 2.0 --elasticsearchRam 2048 --dataDir /data/elasticsearch --elasticsearchDisk 20480")), instances=0)
* Scale(App(/elasticsearch, image="mesos/elasticsearch-scheduler", args="--zookeeperMesosUrl zk://10.138.192.237:2181,10.138.160.181:2181,10.138.160.182:2181/mesos --useIpAddress true --elasticsearchDockerImage elasticsearch-analysis-icu:latest --elasticsearchCpu 2.0 --elasticsearchRam 2048 --dataDir /data/elasticsearch --elasticsearchDisk 20480")), instances=1)
* Start(App(/elasticsearch, image="mesos/elasticsearch-scheduler", args="--zookeeperMesosUrl zk://10.138.192.237:2181,10.138.160.181:2181,10.138.160.182:2181/mesos --useIpAddress true --elasticsearchDockerImage elasticsearch-analysis-icu:latest --elasticsearchCpu 2.0 --elasticsearchRam 2048 --dataDir /data/elasticsearch --elasticsearchDisk 20480")), instances=0)
* Scale(App(/elasticsearch, image="mesos/elasticsearch-scheduler", args="--zookeeperMesosUrl zk://10.138.192.237:2181,10.138.160.181:2181,10.138.160.182:2181/mesos --useIpAddress true --elasticsearchDockerImage elasticsearch-analysis-icu:latest --elasticsearchCpu 2.0 --elasticsearchRam 2048 --dataDir /data/elasticsearch --elasticsearchDisk 20480")), instances=1)
marathon stdout | [2016-09-21 15:02:39,834] INFO [/elasticsearch] storing new app version 2016-09-21T15:02:39.739Z (mesosphere.marathon.state.GroupManager:ForkJoinPool-2-worker-11)
* Start(App(/elasticsearch, image="mesos/elasticsearch-scheduler", args="--zookeeperMesosUrl zk://10.138.192.237:2181,10.138.160.181:2181,10.138.160.182:2181/mesos --useIpAddress true --elasticsearchDockerImage elasticsearch-analysis-icu:latest --elasticsearchCpu 2.0 --elasticsearchRam 2048 --dataDir /data/elasticsearch --elasticsearchDisk 20480")), instances=0)
* Scale(App(/elasticsearch, image="mesos/elasticsearch-scheduler", args="--zookeeperMesosUrl zk://10.138.192.237:2181,10.138.160.181:2181,10.138.160.182:2181/mesos --useIpAddress true --elasticsearchDockerImage elasticsearch-analysis-icu:latest --elasticsearchCpu 2.0 --elasticsearchRam 2048 --dataDir /data/elasticsearch --elasticsearchDisk 20480")), instances=1)
marathon stdout | [2016-09-21 15:02:39,978] INFO Starting app /elasticsearch (mesosphere.marathon.SchedulerActions:marathon-akka.actor.default-dispatcher-7)
marathon stdout | [2016-09-21 15:02:39,982] INFO Already running 0 instances of /elasticsearch. Not scaling. (mesosphere.marathon.SchedulerActions:marathon-akka.actor.default-dispatcher-7)
marathon stdout | [2016-09-21 15:02:39,984] INFO Successfully started 0 instances of /elasticsearch (mesosphere.marathon.upgrade.AppStartActor:marathon-akka.actor.default-dispatcher-10)
marathon stdout | [2016-09-21 15:02:40,030] INFO Started appTaskLaunchActor for /elasticsearch version 2016-09-21T15:02:39.739Z with initial count 1 (mesosphere.marathon.core.launchqueue.impl.AppTaskLauncherActor:marathon-akka.actor.default-dispatcher-7)
marathon stdout | [2016-09-21 15:02:40,044] INFO activating matcher ActorOfferMatcher(Actor[akka://marathon/user/launchQueue/1/0-elasticsearch#57209220]). (mesosphere.marathon.core.matcher.manager.impl.OfferMatcherManagerActor:marathon-akka.actor.default-dispatcher-12)
marathon stdout | [2016-09-21 15:02:40,188] INFO Request Launch for task 'elasticsearch.70c377d8-800c-11e6-bc49-0242c0bc0d2d', version '2016-09-21T15:02:39.739Z'. 1 tasksToLaunch, 0 in flight, 0 confirmed. not backing off (mesosphere.marathon.core.launchqueue.impl.AppTaskLauncherActor:marathon-akka.actor.default-dispatcher-10)
marathon stdout | [2016-09-21 15:02:40,191] INFO No tasks left to launch. Stop receiving offers for /elasticsearch, 2016-09-21T15:02:39.739Z (mesosphere.marathon.core.launchqueue.impl.AppTaskLauncherActor:marathon-akka.actor.default-dispatcher-10)
marathon stdout | [2016-09-21 15:02:40,192] INFO removing matcher ActorOfferMatcher(Actor[akka://marathon/user/launchQueue/1/0-elasticsearch#57209220]) (mesosphere.marathon.core.matcher.manager.impl.OfferMatcherManagerActor:marathon-akka.actor.default-dispatcher-7)
marathon stdout | [2016-09-21 15:02:40,221] INFO Processing LaunchEphemeral(LaunchedEphemeral(task [elasticsearch.70c377d8-800c-11e6-bc49-0242c0bc0d2d],AgentInfo(10.138.160.10,Some(84a2076e-1bf8-47ba-9b78-d96d06782851-S1),Buffer()),2016-09-21T15:02:39.739Z,Status(2016-09-21T15:02:40.172Z,None,None),Vector(31317))) for task [elasticsearch.70c377d8-800c-11e6-bc49-0242c0bc0d2d] (mesosphere.marathon.core.launcher.impl.OfferProcessorImpl:ForkJoinPool-2-worker-15)
marathon stdout | [2016-09-21 15:02:40,277] INFO receiveTaskUpdate: updating status of task [elasticsearch.70c377d8-800c-11e6-bc49-0242c0bc0d2d] (mesosphere.marathon.core.launchqueue.impl.AppTaskLauncherActor:marathon-akka.actor.default-dispatcher-5)
marathon stdout | [2016-09-21 15:02:40,281] INFO Task launch for 'task [elasticsearch.70c377d8-800c-11e6-bc49-0242c0bc0d2d]' was accepted. 0 tasksToLaunch, 0 in flight, 1 confirmed. not backing off (mesosphere.marathon.core.launchqueue.impl.AppTaskLauncherActor:marathon-akka.actor.default-dispatcher-5)
marathon stdout | [2016-09-21 15:02:41,173] INFO Received status update for task elasticsearch.70c377d8-800c-11e6-bc49-0242c0bc0d2d: TASK_RUNNING () (mesosphere.marathon.MarathonScheduler$$EnhancerByGuice$$a504fd7e:Thread-24)
marathon stdout | [2016-09-21 15:02:41,221] INFO receiveTaskUpdate: updating status of task [elasticsearch.70c377d8-800c-11e6-bc49-0242c0bc0d2d] (mesosphere.marathon.core.launchqueue.impl.AppTaskLauncherActor:marathon-akka.actor.default-dispatcher-5)
marathon stdout | [2016-09-21 15:02:41,224] INFO Sending event notification for task [elasticsearch.70c377d8-800c-11e6-bc49-0242c0bc0d2d] of app [/elasticsearch]: TASK_RUNNING (mesosphere.marathon.core.task.update.impl.steps.PostToEventStreamStepImpl$$EnhancerByGuice$$e2136a36:marathon-akka.actor.default-dispatcher-3)
[2016-09-21 15:02:41,227] INFO New task task [elasticsearch.70c377d8-800c-11e6-bc49-0242c0bc0d2d] changed during app /elasticsearch scaling, 1 ready 1 healthy need 1 (mesosphere.marathon.upgrade.TaskStartActor:marathon-akka.actor.default-dispatcher-6)
[2016-09-21 15:02:41,228] INFO Successfully started 1 instances of /elasticsearch (mesosphere.marathon.upgrade.TaskStartActor:marathon-akka.actor.default-dispatcher-6)
However, every 5 seconds or so I see offerings in my Mesos master log on MASTER 2. Not sure it matters:
mesos-master stderr | I0921 15:04:07.230304 53 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O129 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0002 (elasticsearch) at scheduler-a1ce0b42-12c8-4b35-a63f-4be4bcda9cf5@10.138.160.10:38634
I0921 15:04:07.230468 53 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O130 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0002 (elasticsearch) at scheduler-a1ce0b42-12c8-4b35-a63f-4be4bcda9cf5@10.138.160.10:38634
I0921 15:04:07.230556 53 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O131 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0002 (elasticsearch) at scheduler-a1ce0b42-12c8-4b35-a63f-4be4bcda9cf5@10.138.160.10:38634
mesos-master stderr | I0921 15:04:13.223423 47 master.cpp:5324] Sending 3 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0002 (elasticsearch) at scheduler-a1ce0b42-12c8-4b35-a63f-4be4bcda9cf5@10.138.160.10:38634
mesos-master stderr | I0921 15:04:13.245260 48 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O135 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0002 (elasticsearch) at scheduler-a1ce0b42-12c8-4b35-a63f-4be4bcda9cf5@10.138.160.10:38634
mesos-master stderr | I0921 15:04:13.245410 48 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O136 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0002 (elasticsearch) at scheduler-a1ce0b42-12c8-4b35-a63f-4be4bcda9cf5@10.138.160.10:38634
But at this point I have everything up and running as expected. I can load data and everything is good.
However, as soon as the scheduler gets restarted (for example: Marathon -> elasticsearch -> Restart), I notice strange behavior.
In the rare case that the scheduler gets started on the same slave, a new framework ID get registered, and the nodes join ok. But you can see the old framework still shows 3 active tasks:
Additionally, the /data/elasticsearch
directories remain the same on each of the slaves.
And sometimes when I restart the scheduler, it will re-register the framework (if it gets started on the same slave). I BELIEVE THIS IS HOW IT IS SUPPOSED TO WORK:
However, if the scheduler ends up on another node, a new framework is registered and sometimes new data directories are created on the slaves:
Looking through the Mesos master logs, it looks like a normal framework offering:
mesos-master stderr | I0921 15:20:52.285161 49 master.cpp:5324] Sending 1 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
mesos-master stderr | I0921 15:20:52.290634 49 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1191 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
mesos-master stderr | I0921 15:20:53.285879 49 master.cpp:5324] Sending 1 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
mesos-master stderr | I0921 15:20:53.293149 49 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1193 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
mesos-master stderr | I0921 15:20:54.287298 52 master.cpp:5324] Sending 1 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
mesos-master stderr | I0921 15:20:54.292469 52 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1195 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
mesos-master stderr | I0921 15:20:59.292706 47 master.cpp:5324] Sending 2 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
mesos-master stderr | I0921 15:20:59.301736 47 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1199 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
mesos-master stderr | I0921 15:20:59.302198 47 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1200 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
mesos-master stderr | I0921 15:21:01.272701 47 master.hpp:177] Adding task elasticsearch.0117cafb-800f-11e6-bc49-0242c0bc0d2d with resources cpus(*):0.2; mem(*):512; ports(*):[31887-31887] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 (10.138.160.210)
I0921 15:21:01.272755 47 master.cpp:3589] Launching task elasticsearch.0117cafb-800f-11e6-bc49-0242c0bc0d2d of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0001 (marathon) at scheduler-428e1982-1cde-4c79-87af-afae660beb2c@10.138.160.182:35008 with resources cpus(*):0.2; mem(*):512; ports(*):[31887-31887] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210)
mesos-master stderr | I0921 15:21:01.294978 47 master.cpp:5324] Sending 1 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
mesos-master stderr | I0921 15:21:01.300483 48 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1206 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
mesos-master stderr | I0921 15:21:02.144634 48 master.cpp:4763] Status update TASK_RUNNING (UUID: cc47a7f0-c4f5-434a-8f4d-1abcb07b4ba8) for task elasticsearch.0117cafb-800f-11e6-bc49-0242c0bc0d2d of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0001 from slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210)
I0921 15:21:02.144690 48 master.cpp:4811] Forwarding status update TASK_RUNNING (UUID: cc47a7f0-c4f5-434a-8f4d-1abcb07b4ba8) for task elasticsearch.0117cafb-800f-11e6-bc49-0242c0bc0d2d of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0001
mesos-master stderr | I0921 15:21:02.144806 48 master.cpp:6421] Updating the state of task elasticsearch.0117cafb-800f-11e6-bc49-0242c0bc0d2d of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0001 (latest state: TASK_RUNNING, status update state: TASK_RUNNING)
mesos-master stderr | I0921 15:21:02.157840 48 master.cpp:3826] Telling slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) to kill task elasticsearch.7e23786a-800e-11e6-bc49-0242c0bc0d2d of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0001 (marathon) at scheduler-428e1982-1cde-4c79-87af-afae660beb2c@10.138.160.182:35008
mesos-master stderr | I0921 15:21:02.159821 48 master.cpp:3918] Processing ACKNOWLEDGE call cc47a7f0-c4f5-434a-8f4d-1abcb07b4ba8 for task elasticsearch.0117cafb-800f-11e6-bc49-0242c0bc0d2d of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0001 (marathon) at scheduler-428e1982-1cde-4c79-87af-afae660beb2c@10.138.160.182:35008 on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2
mesos-master stderr | I0921 15:21:02.375720 48 master.cpp:3826] Telling slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210) to kill task elasticsearch_10.138.160.210_20160921T151727.026Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
I0921 15:21:02.375823 48 master.cpp:3826] Telling slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56) to kill task elasticsearch_10.138.96.56_20160921T151739.176Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
I0921 15:21:02.375855 48 master.cpp:3826] Telling slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) to kill task elasticsearch_10.138.160.10_20160921T151740.127Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
mesos-master stderr | I0921 15:21:02.742641 48 master.cpp:4763] Status update TASK_KILLED (UUID: 83fafcae-47a6-4a30-8b7f-67b4949fe0f5) for task elasticsearch_10.138.96.56_20160921T151739.176Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 from slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56)
I0921 15:21:02.742699 48 master.cpp:4811] Forwarding status update TASK_KILLED (UUID: 83fafcae-47a6-4a30-8b7f-67b4949fe0f5) for task elasticsearch_10.138.96.56_20160921T151739.176Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003
mesos-master stderr | I0921 15:21:02.742805 48 master.cpp:6421] Updating the state of task elasticsearch_10.138.96.56_20160921T151739.176Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (latest state: TASK_KILLED, status update state: TASK_KILLED)
mesos-master stderr | I0921 15:21:02.816073 48 master.cpp:4763] Status update TASK_KILLED (UUID: 407875d0-dc13-4063-a37d-98421724fc43) for task elasticsearch_10.138.160.210_20160921T151727.026Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 from slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210)
I0921 15:21:02.816128 48 master.cpp:4811] Forwarding status update TASK_KILLED (UUID: 407875d0-dc13-4063-a37d-98421724fc43) for task elasticsearch_10.138.160.210_20160921T151727.026Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003
mesos-master stderr | I0921 15:21:02.816222 48 master.cpp:6421] Updating the state of task elasticsearch_10.138.160.210_20160921T151727.026Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (latest state: TASK_KILLED, status update state: TASK_KILLED)
mesos-master stderr | I0921 15:21:02.824085 48 master.cpp:1212] Framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201 disconnected
I0921 15:21:02.824112 48 master.cpp:2527] Disconnecting framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
I0921 15:21:02.824118 48 master.cpp:2551] Deactivating framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
I0921 15:21:02.824136 48 master.cpp:1236] Giving framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201 30days to failover
mesos-master stderr | I0921 15:21:02.839717 48 master.cpp:4763] Status update TASK_KILLED (UUID: 599a22ce-497c-4c1a-a3ed-855a02809c95) for task elasticsearch_10.138.160.10_20160921T151740.127Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 from slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10)
I0921 15:21:02.839787 48 master.cpp:4811] Forwarding status update TASK_KILLED (UUID: 599a22ce-497c-4c1a-a3ed-855a02809c95) for task elasticsearch_10.138.160.10_20160921T151740.127Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003
W0921 15:21:02.839803 48 master.hpp:1764] Master attempted to send message to disconnected framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
mesos-master stderr | I0921 15:21:02.839990 48 master.cpp:6421] Updating the state of task elasticsearch_10.138.160.10_20160921T151740.127Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (latest state: TASK_KILLED, status update state: TASK_KILLED)
mesos-master stderr | I0921 15:21:03.042268 48 master.cpp:4763] Status update TASK_KILLED (UUID: 18e934a2-f12b-4d29-b517-347c6290aa65) for task elasticsearch.7e23786a-800e-11e6-bc49-0242c0bc0d2d of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0001 from slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10)
I0921 15:21:03.042325 48 master.cpp:4811] Forwarding status update TASK_KILLED (UUID: 18e934a2-f12b-4d29-b517-347c6290aa65) for task elasticsearch.7e23786a-800e-11e6-bc49-0242c0bc0d2d of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0001
mesos-master stderr | I0921 15:21:03.042402 48 master.cpp:6421] Updating the state of task elasticsearch.7e23786a-800e-11e6-bc49-0242c0bc0d2d of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0001 (latest state: TASK_KILLED, status update state: TASK_KILLED)
mesos-master stderr | I0921 15:21:03.060314 48 master.cpp:3918] Processing ACKNOWLEDGE call 18e934a2-f12b-4d29-b517-347c6290aa65 for task elasticsearch.7e23786a-800e-11e6-bc49-0242c0bc0d2d of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0001 (marathon) at scheduler-428e1982-1cde-4c79-87af-afae660beb2c@10.138.160.182:35008 on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1
I0921 15:21:03.060361 48 master.cpp:6487] Removing task elasticsearch.7e23786a-800e-11e6-bc49-0242c0bc0d2d with resources cpus(*):0.2; mem(*):512; ports(*):[31206-31206] of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0001 on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10)
mesos-master stderr | I0921 15:21:06.394582 46 master.cpp:2231] Received SUBSCRIBE call for framework 'elasticsearch' at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:21:06.394665 46 master.cpp:2302] Subscribing framework elasticsearch with checkpointing enabled and capabilities [ ]
I0921 15:21:06.395484 46 master.cpp:5324] Sending 3 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:06.904709 46 master.cpp:2670] Processing REQUEST call for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:21:06.904896 46 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1217 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:21:06.905118 46 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1218 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:21:06.905206 46 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1219 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:21:06.905766 46 master.hpp:177] Adding task elasticsearch_10.138.160.210_20160921T152106.607Z with resources cpus(*):2; mem(*):2048; disk(*):20480; ports(*):[31000-31001] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 (10.138.160.210)
I0921 15:21:06.905809 46 master.cpp:3589] Launching task elasticsearch_10.138.160.210_20160921T152106.607Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394 with resources cpus(*):2; mem(*):2048; disk(*):20480; ports(*):[31000-31001] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210)
mesos-master stderr | I0921 15:21:07.779960 46 master.cpp:4763] Status update TASK_RUNNING (UUID: 4094ca68-e786-4d54-9e0c-513286aee08e) for task elasticsearch_10.138.160.210_20160921T152106.607Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 from slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210)
I0921 15:21:07.780006 46 master.cpp:4811] Forwarding status update TASK_RUNNING (UUID: 4094ca68-e786-4d54-9e0c-513286aee08e) for task elasticsearch_10.138.160.210_20160921T152106.607Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004
I0921 15:21:07.780122 46 master.cpp:6421] Updating the state of task elasticsearch_10.138.160.210_20160921T152106.607Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (latest state: TASK_RUNNING, status update state: TASK_RUNNING)
mesos-master stderr | I0921 15:21:08.011409 46 master.cpp:3918] Processing ACKNOWLEDGE call 4094ca68-e786-4d54-9e0c-513286aee08e for task elasticsearch_10.138.160.210_20160921T152106.607Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394 on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2
mesos-master stderr | I0921 15:21:12.305713 53 master.cpp:5324] Sending 3 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:12.345974 53 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1226 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:12.346107 53 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1227 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:21:12.346192 53 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1228 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:12.744340 50 master.cpp:4763] Status update TASK_KILLED (UUID: 83fafcae-47a6-4a30-8b7f-67b4949fe0f5) for task elasticsearch_10.138.96.56_20160921T151739.176Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 from slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56)
I0921 15:21:12.744400 50 master.cpp:4811] Forwarding status update TASK_KILLED (UUID: 83fafcae-47a6-4a30-8b7f-67b4949fe0f5) for task elasticsearch_10.138.96.56_20160921T151739.176Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003
W0921 15:21:12.744421 50 master.hpp:1764] Master attempted to send message to disconnected framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
I0921 15:21:12.744652 50 master.cpp:6421] Updating the state of task elasticsearch_10.138.96.56_20160921T151739.176Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (latest state: TASK_KILLED, status update state: TASK_KILLED)
mesos-master stderr | I0921 15:21:12.817641 51 master.cpp:4763] Status update TASK_KILLED (UUID: 407875d0-dc13-4063-a37d-98421724fc43) for task elasticsearch_10.138.160.210_20160921T151727.026Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 from slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210)
I0921 15:21:12.817737 51 master.cpp:4811] Forwarding status update TASK_KILLED (UUID: 407875d0-dc13-4063-a37d-98421724fc43) for task elasticsearch_10.138.160.210_20160921T151727.026Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003
W0921 15:21:12.817759 51 master.hpp:1764] Master attempted to send message to disconnected framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
I0921 15:21:12.817925 51 master.cpp:6421] Updating the state of task elasticsearch_10.138.160.210_20160921T151727.026Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (latest state: TASK_KILLED, status update state: TASK_KILLED)
mesos-master stderr | I0921 15:21:12.840368 52 master.cpp:4763] Status update TASK_KILLED (UUID: 599a22ce-497c-4c1a-a3ed-855a02809c95) for task elasticsearch_10.138.160.10_20160921T151740.127Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 from slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10)
I0921 15:21:12.840422 52 master.cpp:4811] Forwarding status update TASK_KILLED (UUID: 599a22ce-497c-4c1a-a3ed-855a02809c95) for task elasticsearch_10.138.160.10_20160921T151740.127Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003
W0921 15:21:12.840437 52 master.hpp:1764] Master attempted to send message to disconnected framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
mesos-master stderr | I0921 15:21:12.840659 52 master.cpp:6421] Updating the state of task elasticsearch_10.138.160.10_20160921T151740.127Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I0921 15:21:18.313294 53 master.cpp:5324] Sending 2 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:18.468693 53 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1236 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:21:18.468876 53 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1237 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:21:18.469595 53 master.hpp:177] Adding task elasticsearch_10.138.160.10_20160921T152118.410Z with resources cpus(*):2; mem(*):2048; disk(*):20480; ports(*):[31000-31001] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 (10.138.160.10)
I0921 15:21:18.469645 53 master.cpp:3589] Launching task elasticsearch_10.138.160.10_20160921T152118.410Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394 with resources cpus(*):2; mem(*):2048; disk(*):20480; ports(*):[31000-31001] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10)
mesos-master stderr | I0921 15:21:19.315526 53 master.cpp:5324] Sending 1 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:19.363168 53 master.cpp:4763] Status update TASK_RUNNING (UUID: 85689f2c-b0c4-4c73-87dd-2fbbf985b6ca) for task elasticsearch_10.138.160.10_20160921T152118.410Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 from slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10)
I0921 15:21:19.363225 53 master.cpp:4811] Forwarding status update TASK_RUNNING (UUID: 85689f2c-b0c4-4c73-87dd-2fbbf985b6ca) for task elasticsearch_10.138.160.10_20160921T152118.410Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004
mesos-master stderr | I0921 15:21:19.363407 53 master.cpp:6421] Updating the state of task elasticsearch_10.138.160.10_20160921T152118.410Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (latest state: TASK_RUNNING, status update state: TASK_RUNNING)
mesos-master stderr | I0921 15:21:19.505738 53 master.cpp:3918] Processing ACKNOWLEDGE call 85689f2c-b0c4-4c73-87dd-2fbbf985b6ca for task elasticsearch_10.138.160.10_20160921T152118.410Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394 on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1
mesos-master stderr | I0921 15:21:19.506001 53 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1238 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:19.506567 53 master.hpp:177] Adding task elasticsearch_10.138.96.56_20160921T152119.358Z with resources cpus(*):2; mem(*):2048; disk(*):20480; ports(*):[31000-31001] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 (10.138.96.56)
I0921 15:21:19.506613 53 master.cpp:3589] Launching task elasticsearch_10.138.96.56_20160921T152119.358Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394 with resources cpus(*):2; mem(*):2048; disk(*):20480; ports(*):[31000-31001] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56)
mesos-master stderr | I0921 15:21:20.363615 51 master.cpp:4763] Status update TASK_RUNNING (UUID: 5b46402e-4eaf-44e4-af42-5745aaf1b8ee) for task elasticsearch_10.138.96.56_20160921T152119.358Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 from slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56)
I0921 15:21:20.363661 51 master.cpp:4811] Forwarding status update TASK_RUNNING (UUID: 5b46402e-4eaf-44e4-af42-5745aaf1b8ee) for task elasticsearch_10.138.96.56_20160921T152119.358Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004
I0921 15:21:20.363783 51 master.cpp:6421] Updating the state of task elasticsearch_10.138.96.56_20160921T152119.358Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (latest state: TASK_RUNNING, status update state: TASK_RUNNING)
mesos-master stderr | I0921 15:21:20.442539 51 master.cpp:3918] Processing ACKNOWLEDGE call 5b46402e-4eaf-44e4-af42-5745aaf1b8ee for task elasticsearch_10.138.96.56_20160921T152119.358Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394 on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0
mesos-master stderr | I0921 15:21:24.321563 51 master.cpp:5324] Sending 2 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:24.330883 51 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1242 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:24.331029 51 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1243 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:25.322654 51 master.cpp:5324] Sending 1 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:25.333015 51 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1244 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:31.328562 49 master.cpp:5324] Sending 3 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:31.342964 49 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1250 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:21:31.343096 49 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1251 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:21:31.343181 49 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1252 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:32.744709 49 master.cpp:4763] Status update TASK_KILLED (UUID: 83fafcae-47a6-4a30-8b7f-67b4949fe0f5) for task elasticsearch_10.138.96.56_20160921T151739.176Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 from slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56)
I0921 15:21:32.744767 49 master.cpp:4811] Forwarding status update TASK_KILLED (UUID: 83fafcae-47a6-4a30-8b7f-67b4949fe0f5) for task elasticsearch_10.138.96.56_20160921T151739.176Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003
W0921 15:21:32.744782 49 master.hpp:1764] Master attempted to send message to disconnected framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
mesos-master stderr | I0921 15:21:32.744968 49 master.cpp:6421] Updating the state of task elasticsearch_10.138.96.56_20160921T151739.176Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (latest state: TASK_KILLED, status update state: TASK_KILLED)
mesos-master stderr | I0921 15:21:32.818581 49 master.cpp:4763] Status update TASK_KILLED (UUID: 407875d0-dc13-4063-a37d-98421724fc43) for task elasticsearch_10.138.160.210_20160921T151727.026Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 from slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210)
I0921 15:21:32.818646 49 master.cpp:4811] Forwarding status update TASK_KILLED (UUID: 407875d0-dc13-4063-a37d-98421724fc43) for task elasticsearch_10.138.160.210_20160921T151727.026Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003
W0921 15:21:32.818670 49 master.hpp:1764] Master attempted to send message to disconnected framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
mesos-master stderr | I0921 15:21:32.818889 49 master.cpp:6421] Updating the state of task elasticsearch_10.138.160.210_20160921T151727.026Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (latest state: TASK_KILLED, status update state: TASK_KILLED)
mesos-master stderr | I0921 15:21:32.841390 49 master.cpp:4763] Status update TASK_KILLED (UUID: 599a22ce-497c-4c1a-a3ed-855a02809c95) for task elasticsearch_10.138.160.10_20160921T151740.127Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 from slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10)
I0921 15:21:32.841459 49 master.cpp:4811] Forwarding status update TASK_KILLED (UUID: 599a22ce-497c-4c1a-a3ed-855a02809c95) for task elasticsearch_10.138.160.10_20160921T151740.127Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003
W0921 15:21:32.841483 49 master.hpp:1764] Master attempted to send message to disconnected framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (elasticsearch) at scheduler-6a77d8d4-6127-487c-84ba-bdb816c1cbc0@10.138.160.10:39201
I0921 15:21:32.841718 49 master.cpp:6421] Updating the state of task elasticsearch_10.138.160.10_20160921T151740.127Z of framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0003 (latest state: TASK_KILLED, status update state: TASK_KILLED)
mesos-master stderr | I0921 15:21:37.336771 46 master.cpp:5324] Sending 3 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:37.353024 46 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1256 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:37.354007 46 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1257 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:21:37.354105 46 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1258 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:43.343149 48 master.cpp:5324] Sending 3 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:43.367455 48 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1262 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:43.367637 48 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1263 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:21:43.367727 48 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1264 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:21:49.347899 52 master.cpp:5324] Sending 2 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:49.356815 53 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1269 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:21:49.357127 53 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1270 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:50.348757 53 master.cpp:5324] Sending 1 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:50.355077 53 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1271 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:56.354876 53 master.cpp:5324] Sending 3 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:56.368376 53 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1277 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:21:56.368585 53 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1278 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:21:56.368727 53 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1279 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:22:02.361115 50 master.cpp:5324] Sending 3 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:22:02.373376 50 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1283 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:22:02.373585 50 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1284 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:22:02.373690 50 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1285 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:22:08.368897 52 master.cpp:5324] Sending 3 offers to framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
mesos-master stderr | I0921 15:22:08.381808 52 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1289 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S2 at slave(1)@10.138.160.210:5051 (10.138.160.210) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:22:08.381964 52 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1290 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S1 at slave(1)@10.138.160.10:5051 (10.138.160.10) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I0921 15:22:08.382071 52 master.cpp:3104] Processing ACCEPT call for offers: [ 84a2076e-1bf8-47ba-9b78-d96d06782851-O1291 ] on slave 84a2076e-1bf8-47ba-9b78-d96d06782851-S0 at slave(1)@10.138.96.56:5051 (10.138.96.56) for framework 84a2076e-1bf8-47ba-9b78-d96d06782851-0004 (elasticsearch) at scheduler-d40db645-1242-4941-8c6f-a64193dcf2e8@10.138.160.210:45394
I'll keep digging around, but do you have any ideas off the top of your head?
Ah ok. I don't think there is anything wrong here.
Firstly, all Mesos applications run on the slaves. Nothing runs on the master. The master's only job is to orchestrate the cluster. Framework scheduler's are applications, just like anything else.
Here is the key:
However, as soon as the scheduler gets restarted (for example: Marathon -> elasticsearch -> Restart), I notice strange behavior.
You're manually shutting down the scheduler. You're telling Mesos/Marathon to shut down that framework. So yes, when it starts up again it will start a new framework.
However, if it died accidentally (e.g. you go in and docker kill scheduler
on the slave that it was running on) then it would restart and re-obtain the same framework ID.
The offers are ok, they're just offers. Not accepted.
So, long story short, what you are seeing is expected behaviour!
Close if you're happy. If not, get back in touch.
Thanks for using the ES framework! 👍
Hey, thanks for the detailed reply! 💯
I was originally testing this scenario by rebooting the machine (with reboot
), and noticed that the framework got re-registered as a new ID when the machine came back up.
But after reading your reply I got to thinking that a reboot
probably notifies the scheduler that it's gracefully shutting down. This appears to be the case. If I reboot the machine with reboot -f
the scheduler seems to keep track of the framework and I no longer have the duplicate framework IDs!
However, I noticed that when the machine comes back up, the ES container never gets started again. I've verified that the mesos slave is available and has resources, but the ES scheduler seems to think the ES container is already running on one, and says the first ES node is not responding on the others.
Reason: Host already running task
Reason: First ES node is not responding
Reason: First ES node is not responding
Here's a log from a single instance of ES running, having already rebooted the other two:
[DEBUG] 2016-09-22 17:12:14,156 class org.apache.mesos.elasticsearch.scheduler.ElasticsearchScheduler isHostnameResolveable - Attempting to resolve hostname: 10.138.160.10
[DEBUG] 2016-09-22 17:12:14,161 org.apache.mesos.Protos$TaskStatus <init> - Task status for elasticsearch_10.138.160.10_20160922T165447.254Z exists, using old state: TASK_RUNNING
[DEBUG] 2016-09-22 17:12:14,169 org.apache.mesos.Protos$TaskStatus <init> - Task status for elasticsearch_10.138.160.210_20160922T165459.190Z exists, using old state: TASK_RUNNING
[DEBUG] 2016-09-22 17:12:14,178 org.apache.mesos.Protos$TaskStatus <init> - Task status for elasticsearch_10.138.96.56_20160922T165459.304Z exists, using old state: TASK_RUNNING
[DEBUG] 2016-09-22 17:12:14,182 class org.apache.mesos.elasticsearch.scheduler.ElasticsearchScheduler resourceOffers - Declined offer: id { value: "4a61df55-1ada-4a7e-9282-e581bf78dd4a-O1092" }, framework_id { value: "4a61df55-1ada-4a7e-9282-e581bf78dd4a-0002" }, slave_id { value: "4a61df55-1ada-4a7e-9282-e581bf78dd4a-S4" }, hostname: "10.138.160.10", resources { name: "cpus", type: SCALAR, scalar { value: 4.0, }, role: "*" }, resources { name: "mem", type: SCALAR, scalar { value: 6959.0, }, role: "*" }, resources { name: "disk", type: SCALAR, scalar { value: 75389.0, }, role: "*" }, resources { name: "ports", type: RANGES, ranges { range { begin: 31000, end: 32000, }, }, role: "*" }, url { scheme: "http", address { hostname: "10.138.160.10", ip: "10.138.160.10", port: 5051, }, path: "/slave(1)" }, Reason: First ES node is not responding
[DEBUG] 2016-09-22 17:12:15,157 class org.apache.mesos.elasticsearch.scheduler.ElasticsearchScheduler isHostnameResolveable - Attempting to resolve hostname: 10.138.160.210
[DEBUG] 2016-09-22 17:12:15,163 org.apache.mesos.Protos$TaskStatus <init> - Task status for elasticsearch_10.138.160.10_20160922T165447.254Z exists, using old state: TASK_RUNNING
[DEBUG] 2016-09-22 17:12:15,168 org.apache.mesos.Protos$TaskStatus <init> - Task status for elasticsearch_10.138.160.210_20160922T165459.190Z exists, using old state: TASK_RUNNING
[DEBUG] 2016-09-22 17:12:15,173 org.apache.mesos.Protos$TaskStatus <init> - Task status for elasticsearch_10.138.96.56_20160922T165459.304Z exists, using old state: TASK_RUNNING
[DEBUG] 2016-09-22 17:12:15,177 class org.apache.mesos.elasticsearch.scheduler.ElasticsearchScheduler resourceOffers - Declined offer: id { value: "4a61df55-1ada-4a7e-9282-e581bf78dd4a-O1094" }, framework_id { value: "4a61df55-1ada-4a7e-9282-e581bf78dd4a-0002" }, slave_id { value: "4a61df55-1ada-4a7e-9282-e581bf78dd4a-S3" }, hostname: "10.138.160.210", resources { name: "cpus", type: SCALAR, scalar { value: 3.8, }, role: "*" }, resources { name: "mem", type: SCALAR, scalar { value: 6447.0, }, role: "*" }, resources { name: "disk", type: SCALAR, scalar { value: 75389.0, }, role: "*" }, resources { name: "ports", type: RANGES, ranges { range { begin: 31000, end: 31522, }, range { begin: 31524, end: 32000, }, }, role: "*" }, url { scheme: "http", address { hostname: "10.138.160.210", ip: "10.138.160.210", port: 5051, }, path: "/slave(1)" }, Reason: First ES node is not responding
[DEBUG] 2016-09-22 17:12:15,179 class org.apache.mesos.elasticsearch.scheduler.ElasticsearchScheduler resourceOffers - Declined offer: id { value: "4a61df55-1ada-4a7e-9282-e581bf78dd4a-O1095" }, framework_id { value: "4a61df55-1ada-4a7e-9282-e581bf78dd4a-0002" }, slave_id { value: "4a61df55-1ada-4a7e-9282-e581bf78dd4a-S0" }, hostname: "10.138.96.56", resources { name: "cpus", type: SCALAR, scalar { value: 2.0, }, role: "*" }, resources { name: "mem", type: SCALAR, scalar { value: 4911.0, }, role: "*" }, resources { name: "disk", type: SCALAR, scalar { value: 54909.0, }, role: "*" }, resources { name: "ports", type: RANGES, ranges { range { begin: 31002, end: 32000, }, }, role: "*" }, url { scheme: "http", address { hostname: "10.138.96.56", ip: "10.138.96.56", port: 5051, }, path: "/slave(1)" }, Reason: Host already running task
In my 3 node slave pool, I observe the following:
Slave 1
- all 3 ES instances startSlave 1
Slave 2
- only 2 ES instances are running (Slave 2
and Slave 3
)Slave 1
comes back online, joins Mesos cluster - only 2 ES instances are running (Slave 2
and Slave 3
)Slave 2
Slave 3
- only 1 ES instance is running (Slave 3
)Slave 2
comes back online, joins Mesos cluster - only 1 ES instances is running (Slave 3
)Slave 3
No matter how many times the scheduler moves, the 3 instances of ES never get started. I'm guessing the old state is in Zookeeper (using old state: TASK_RUNNING
), but shouldn't a health check ping remove them?
Also, I noticed a feature request for adding constraints to the scheduler. Is that a work in progress? It would be nice to be able to dedicate a few slaves out of the pool to ES jobs only.
Thanks again for the help!
Hi again @cookandy, yes, you are correct. This is a duplicate of #550.
The problem is that if Mesos doesn't tell the framework that there is a problem, then it doesn't know about the problem. IMO, this is a problem with Mesos, it should be responsible for announcing issues/restarts/status/whatever to the framework. But that's not implemented.
So the only alternative is to implement a "ping" mechanism, whereby the scheduler periodically pings the executors to make sure they really are alive. However, this is more difficult than it first seems due to network/service discovery issues.
Either way, it's an outstanding bug. To work around, yes, you can clean the zookeeper state.
Hi @philwinder, thanks for the reply.
It's a shame that Mesos doesn't handle those events correctly. I agree, a ping mechanism sounds like it would be a good fix. I realize there are challenges with different network configurations, but have you begun any of the work needed to implement this check?
Thanks again for the detailed reply. Cheers.
No, I haven't. We don't have sponsorship any more to develop this full time. We're looking for contributions, or a bit of free time to improve it. Although the design has lasted quite a long time already, there are a few things that really aught to be done, including this.
Thanks for the help, @philwinder. I hope you find the resources to maintain this project.
One last question. Is it possible to use mesos attributes (tags) in the ES scheduler to run ES on specific slaves in the cluster?
If you want to run the scheduler on a specific node, then yes, you could add this to the marathon json. If you want to run the executors (ES instances) on specific nodes then no, because it is scheduled by the scheduler, not marathon, so there's no way of adding tags without editing the code to allow them.
That's what I was afraid of. I'll go ahead and close this issue since it's already being tracked under #550. Thanks a ton for the help!
@philwinder one last issue. I deleted the zk state with:
delete /elasticsearch/mesos-ha/0d1aebc2-9200-4e16-8604-9529525d3bf5-0002/state/elasticsearch_10.138.160.210_20160930T193158.369Z
and now the scheduler says Cluster size already fulfilled
. I've tried to scale down to 2 instances, but it doesn't work. Any ideas?
[DEBUG] 2016-09-30 20:36:39,635 org.apache.mesos.Protos$TaskStatus <init> - Task status for elasticsearch_10.138.96.56_20160930T193146.112Z exists, using old state: TASK_RUNNING
[DEBUG] 2016-09-30 20:36:39,639 org.apache.mesos.Protos$TaskStatus <init> - Task status for elasticsearch_10.138.160.210_20160930T193158.369Z exists, using old state: TASK_STAGING
[DEBUG] 2016-09-30 20:36:39,643 org.apache.mesos.Protos$TaskStatus <init> - Task status for elasticsearch_10.138.160.10_20160930T193159.481Z exists, using old state: TASK_RUNNING
[DEBUG] 2016-09-30 20:36:39,651 class org.apache.mesos.elasticsearch.scheduler.ElasticsearchScheduler resourceOffers - Declined offer: id { value: "0d1aebc2-9200-4e16-8604-9529525d3bf5-O4238" }, framework_id { value: "0d1aebc2-9200-4e16-8604-9529525d3bf5-0002" }, slave_id { value: "0d1aebc2-9200-4e16-8604-9529525d3bf5-S3" }, hostname: "10.138.160.210", resources { name: "cpus", type: SCALAR, scalar { value: 4.0, }, role: "*" }, resources { name: "mem", type: SCALAR, scalar { value: 6959.0, }, role: "*" }, resources { name: "disk", type: SCALAR, scalar { value: 75389.0, }, role: "*" }, resources { name: "ports", type: RANGES, ranges { range { begin: 31000, end: 32000, }, }, role: "*" }, url { scheme: "http", address { hostname: "10.138.160.210", ip: "10.138.160.210", port: 5051, }, path: "/slave(1)" }, Reason: Cluster size already fulfilled
Did you delete the state whilst the scheduler was running? If so, the state in the scheduler would be the same. You have to stop the scheduler first, delete the state, start again.
Is that all the state, or the right state? Maybe that's old. Try deleting the entire delete /elasticsearch/mesos-ha
folder.
If not that, try using a different cluster name to get a fresh state.
Finally, if that doesn't work I don't know. There's definitely some state left over somewhere.
Thanks, Phil
Hi,
I noticed that when I restart the elasticsearch scheduler in Marathon, it registers a new instance of the elasticsearch framework:
Is this by design? If so, is the assumption that in production you'd always run at least 2 instances of the scheduler? How can I prevent multiple instances of the framework getting registered?
Thanks