mesosphere-backup / etcd-mesos

self-healing etcd on mesos!
Apache License 2.0
68 stars 19 forks source link

etcd Framework fails to deploy all instances rejecting resource offers #123

Open juwi opened 7 years ago

juwi commented 7 years ago

Hi,

I am trying to deploy a quorum of three etcd instances using the etcd-mesos framework. However it fails to deploy one of them with the following message:

E0529 15:49:13.683296       7 scheduler.go:347] 
message: Total resources cpus(*)(allocated: *):0.3; 
                         mem(*)(allocated: *):288; 
                         disk(*)(allocated: *):4096; 
                         ports(*)(allocated: *):[1026-1029] required by task and its executor is more than available ports(*)(allocated: *):[1026-1028, 1038-2180, 2182-3887, 3889-5049, 5052-6788, 6790-8079, 8082-8180, 8182-32000]; 
                         disk(*)(allocated: *):6648; 
                         cpus(*)(allocated: *):1.05; 
                         mem(*)(allocated: *):1871

However, this surprises me, somewhat. The node it is trying to schedule currently has not a single task running on it, netstat reports the ports as free and the agent, when it cmes up, logs the following:

Agent resources: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 8182-32000]; disk(*)[MOUNT:/dcos/volume0]:102216; disk(*):17764; cpus(*):4; mem(*):10831

On these offers, the master reports the following:

Sending 2 offers to framework ef1d951d-88f6-4b14-9715-57b38e9207eb-0013 (etcd) at scheduler(1)@10.0.242.11:29892
I0529 16:04:17.000000  8494 master.cpp:4731] Processing DECLINE call for offers: [ ef1d951d-88f6-4b14-9715-57b38e9207eb-O41417 ] for framework ef1d951d-88f6-4b14-9715-57b38e9207eb-0013 (etcd) at scheduler(1)@10.0.242.11:29892
I0529 16:04:18.000000  8499 master.cpp:3776] Processing ACCEPT call for offers: [ ef1d951d-88f6-4b14-9715-57b38e9207eb-O41418 ] on agent ef1d951d-88f6-4b14-9715-57b38e9207eb-S4 at slave(1)@10.0.242.12:5051 (10.0.242.12) for framework ef1d951d-88f6-4b14-9715-57b38e9207eb-0013 (etcd) at scheduler(1)@10.0.242.11:29892
I0529 16:04:18.000000  8499 master.cpp:6217] Sending status update TASK_ERROR for task etcd-1496071171 10.0.242.12 1026 1027 1028 of framework ef1d951d-88f6-4b14-9715-57b38e9207eb-0013 'Total resources cpus(*)(allocated: *):0.3; mem(*)(allocated: *):288; disk(*)(allocated: *):4096; ports(*)(allocated: *):[1026-1029] required by task and its executor is more than available ports(*)(allocated: *):[1026-1028, 1038-2180, 2182-3887, 3889-5049, 5052-6788, 6790-8079, 8082-8180, 8182-32000]; disk(*)(allocated: *):6648; cpus(*)(allocated: *):1.05; mem(*)(allocated: *):1871'
I0529 16:04:18.000000  8498 hierarchical.cpp:807] Updated allocation of framework ef1d951d-88f6-4b14-9715-57b38e9207eb-0013 on agent ef1d951d-88f6-4b14-9715-57b38e9207eb-S4 from ports(*)(allocated: *):[1026-1028, 1038-2180, 2182-3887, 3889-5049, 5052-6788, 6790-8079, 8082-8180, 8182-32000]; disk(*)(allocated: *):6648; cpus(*)(allocated: *):1.05; mem(*)(allocated: *):1871 to ports(*)(allocated: *):[1026-1028, 1038-2180, 2182-3887, 3889-5049, 5052-6788, 6790-8079, 8082-8180, 8182-32000]; disk(*)(allocated: *):6648; cpus(*)(allocated: *):1.05; mem(*)(allocated: *):187

Any ideas where to continue looking?

eolix commented 7 years ago

Did you try cleaning zookeeper at /exhibitor?

jdef commented 7 years ago

How large is your cluster? etcd-mesos wants 3 agents by default. There's a flag you can set to tell it to NOT require 3 separate agents

On Wed, May 31, 2017 at 1:19 PM, Patrik Ohlson notifications@github.com wrote:

Did you try cleaning zookeeper at /exhibitor?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mesosphere/etcd-mesos/issues/123#issuecomment-305256573, or mute the thread https://github.com/notifications/unsubscribe-auth/ACPVLGckyy60LkjZkeUwiAGtF7yS2maKks5r_aEdgaJpZM4NpeKF .