thefactory / cloudformation-mesos

[Moved] CloudFormation templates for a production-ready Mesos cluster
https://github.com/mbabineau/cloudformation-mesos
44 stars 15 forks source link

Isolated Instances #5

Open alexisvincent opened 9 years ago

alexisvincent commented 9 years ago

I'm struggling to get a system up. When I spin up a Mesos cluster the masters and slaves are isolated... As a result I suspect and issue with zookeeper.

I've setup 3 functioning zookeeper instances using factory/cloudformation-zk-exhibitor. Slaves aren't registering with the master and neither are other masters. Docker containers are dying immediately. If someone would be able to give me a hand I would greatly appreciate it!

Here's the output of docker logs for the marathon container:

MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib. MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/local/lib/libmesos.so' [2014-11-03 21:08:23,282] INFO Starting Marathon 0.7.1 (mesosphere.marathon.Main$:20) [scallop] Error: Validation failure for 'zk' option parameters: zk:///mesos_marathon

And here's the output of docker logs for the marathon logger container:

Configuring in-memory store with {'max_length': 100} HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /v2/eventSubscriptions?callbackUrl=http%3A%2F%2Flocalhost%3A5000%2Fevents (Caused by <class 'socket.error'>: [Errno 111] Connection refused) Traceback (most recent call last): File "/opt/marathon-logger/marathon-logger.py", line 53, in m.create_event_subscription(args.callback_url) File "/usr/local/lib/python2.7/dist-packages/marathon/client.py", line 278, in create_event_subscription return response.json() AttributeError: 'NoneType' object has no attribute 'json'

mbabineau commented 9 years ago

Looks like the Mesos servers are failing to talk to Exhibitor. Can you log into a Mesos master or slave and try curling the Exhibitor URL you passed to the stack?