Open Radek44 opened 8 years ago
yep, we need better uninstall instructions for etcd on DCOS.
go to <dcos-hostname>/exhibitor
and view the node tree. you should see etcd
as a child of root. delete it. then try to re-install.
Brilliant. thank you @jdef this worked.
Quick addition - it looks like as soon as I try to scale out etcd using marathon (going from default 1 instance to 3 as recommended) the deployment of the 3 instances fails for the same reason.
@spacejam is this supported? I was under the impression that cluster size should be determined at framework startup time, and only then.
On Mon, Feb 22, 2016 at 1:11 PM, Radek Dabrowski notifications@github.com wrote:
Quick addition - it looks like as soon as I try to scale out etcd using marathon (going from default 1 instance to 3 as recommended) the deployment of the 3 instances fails for the same reason.
— Reply to this email directly or view it on GitHub https://github.com/mesosphere/etcd-mesos/issues/95#issuecomment-187299276 .
That's correct, @jdef. Marathon starts the etcd-mesos scheduler, rather than the instances of etcd (the instances are managed by what marathon or another higher-order supervisor framework starts). Marathon will show 1 instance running because there is only 1 etcd-mesos framework running with a particular configuration. The number of etcd instances should be determined at initialization time when submitting the app definition to marathon, for instance with the CLUSTER_SIZE
env var in the provided example marathon spec:
{
"id": "etcd",
"container": {
"docker": {
"forcePullImage": true,
"image": "mesosphere/etcd-mesos:0.1.0-alpha-target-23-24-25"
},
"type": "DOCKER"
},
"cpus": 0.2,
"env": {
"FRAMEWORK_NAME": "etcd",
"WEBURI": "http://etcd.marathon.mesos:$PORT0/stats",
"MESOS_MASTER": "zk://master.mesos:2181/mesos",
"ZK_PERSIST": "zk://master.mesos:2181/etcd",
"AUTO_RESEED": "true",
"RESEED_TIMEOUT": "240",
"CLUSTER_SIZE": "3",
"CPU_LIMIT": "1",
"DISK_LIMIT": "4096",
"MEM_LIMIT": "2048",
"VERBOSITY": "1"
},
"healthChecks": [
{
"gracePeriodSeconds": 60,
"intervalSeconds": 30,
"maxConsecutiveFailures": 0,
"path": "/healthz",
"portIndex": 0,
"protocol": "HTTP"
}
],
"instances": 1,
"mem": 128.0,
"ports": [
0,
1,
2
]
}
actually, since you're using DCOS, you can specify the "cluster-size"
configuration option for etcd to be something other than 3, but 3 is the default and recommended unless you are willing to trade slower writes for faster reads and additional availability with a cluster of 5.
I setup etcd on my cluster using DCOS CLI a first time and it worked. I then uninstalled it. A couple days later I decided to reinstall but since, every installation is failing. It seems that the reason for this is that the framework is found in Zookeeper but fails at restoring. Here is the failure trace I got through the stderr file in mesos (just changed the IPs with x.x.x.x (agent) and y.y.y.y(mesos master):
Any suggestions on how to fix the deployment?