sequenceiq / docker-ambari

Docker image with Ambari
291 stars 200 forks source link

Ambari cluster is created but nodes have connectivity issues #69

Closed zbears closed 9 years ago

zbears commented 9 years ago

I have tried your one-step script for cluster creation on multiple computers on multiple networks. Although cluster creation works great, when looking at the nodes in the cluster from the ambari manager page, the nodes appear to have issues connecting to their own services. Is there a fix for this? screenshot from 2015-07-10 10 00 05

matyix commented 9 years ago

What do you mean under - on multiple computers on multiple networks? 2.0.0 (the serf based version) should be able to create clusters on one physical host. Though you can actually make it work on multiple hosts we don't support it with serf - but the consul based branches. Unluckily those are not meant to work (we did not have time to update the shell and script) with the script - but we use it from Cloudbreak.

Sometimes in the future we will get rid of all serf based Docker images and use consul - as we do it with Cloudbreak - and update the scripts/shell.

keyki commented 9 years ago

I just quickly installed a 2 node cluster and I don't have any alerts. Can you exec into the amb0 container and try to ping amb1.mycorp.kom

zbears commented 9 years ago

I apologize for the confusion. I meant that I had tried the serf-based approach on a single computer. When that didn't work, I tried again on a few others.

zbears commented 9 years ago

If you click on the hosts tab, do you get anything next to amb1.mycorp.kom and amb2.mycorp.kom?

zbears commented 9 years ago

It appeared fine from the main screen but the problem emerged in the hosts area

zbears commented 9 years ago

screenshot from 2015-07-10 10 26 45

keyki commented 9 years ago

I don't have any, you can login here admin/admin http://2eb016cf.ngrok.com

Can you briefly describe you environment? Host OS, Docker version etc.

zbears commented 9 years ago

The ping works fine but ping uses ICMP whereas TCP uses a different protocol

zbears commented 9 years ago

I'm running Docker 1.7.0 on Ubuntu 14.0.4.

zbears commented 9 years ago

I've also tried on Ubuntu 12.04.5 with no luck

zbears commented 9 years ago

What was the command that you ran to get your 2 node blueprint? It shouldn't affect anything but the default creates a 3 node cluster.

keyki commented 9 years ago

amb-deploy-cluster 2

keyki commented 9 years ago

For quick fix you can try to write the hostnames to /etc/hosts in the container, otherwise we need to be able to reproduce it to see what's wrong.

zbears commented 9 years ago

Strangely enough, the hostname is in the /etc/hosts file of the container. I'm trying a 2 node cluster now to see if that changes anything. If ping is working but TCP is not, it makes me think that it has something to do with Docker's treatment of ports. I'll update you as soon as the cluster is installed.

zbears commented 9 years ago

It appears like the problem is not with the slaves connecting to the master but rather the slaves connecting to their own locally run services.

keyki commented 9 years ago

Did you check whether the services are actually running? Also check the ambari-server.log to see if it's just and alert issue or not.

zbears commented 9 years ago

The Ambari dashboard says they are running. Should I be looking elsewhere? I just ran the 2-node system. Everything looks great for about a minute then I start getting the same TCP errors.

zbears commented 9 years ago

Okay. Turns out it was a master-slave communication thing. After adding the slave manually to the master's /etc/hosts file and restarting the service, it appears everything has been fixed. Is this a configuration step that must be run every time the script is started or was it something weird with my particular experience?

keyki commented 9 years ago

This version is out for a while and no one reported such issues which doesn't mean it doesn't exists, but could be environment related.

zbears commented 9 years ago

Okay. Thank you so much for your prompt help. I'll try to get this fix incorporated into the script just in case anyone else has the same problem.