mesos / kibana

Kibana on Mesos
Apache License 2.0
27 stars 9 forks source link

The framework does not persist its cluster state #9

Closed philwinder closed 8 years ago

philwinder commented 8 years ago

The Kibana framework does not persist its cluster state. If the scheduler is killed, any remaining executors are orphaned. This is because the new scheduler has no knowledge of any previous cluster.

Test

For the test, we need a working ES cluster to connect to. Mesos 0.25, 3 slaves. MASTER and SLAVE{X} environmental variables exported.

Install Kibana

Create a marathon json file:

{
  "id": "kibana",
  "uris": [
    "https://github.com/mesos/kibana/releases/download/0.3.0/kibana-0.3.0.jar"
  ],
  "args": ["java", "-jar", "kibana-0.3.0.jar",
    "-zookeeper", "zk://$MASTER:2181/mesos",
    "-version", "4.3.1",
    "-elasticsearch", "http://$SLAVE0:31000/",
    "-mem", "1024"
  ],
  "cpus": 0.2,
  "mem": 384.0,
  "env": {
    "JAVA_OPTS": "-Xms128m -Xmx256m"
  },
  "instances": 1
}

Run the marathon JSON file. This will replace the MASTER env var with the $MASTER variable in the file.

$ cat kibana.json | sed -e 's/$MASTER/'"$MASTER"'/' | sed -e 's/$SLAVE0/'"$SLAVE0"'/' | curl -XPOST -H 'Content-Type:application/json' -d @- http://$MASTER:8080/v2/apps

This should start the Kibana scheduler and a single Kibana instance.

Testing scheduler resiliency

Because the scheduler is a jar, we have to kill the process:

$ ssh -i $KEY ubuntu@$SLAVE2 'sudo ps -eo pid,command | grep kibana-0.3.0.jar | grep -v grep | awk '\''{print $1}'\'' | xargs sudo kill -9'

Now check that the scheduler has restarted:

$ curl -s http://$MASTER:5050/tasks.json | jq '.tasks[0:3] | .[] | "\(.slave_id) \(.id) \(.state)"'
"d5f9f892-1078-4955-8936-5a784603f76b-S1 kibana-0 TASK_RUNNING"
"d5f9f892-1078-4955-8936-5a784603f76b-S2 kibana.91917340-c5bc-11e5-961c-024283befc82 TASK_RUNNING"
"d5f9f892-1078-4955-8936-5a784603f76b-S2 kibana-1 TASK_RUNNING"

Note how a new scheduler has started, but the old tasks has become orphaned.