hortonworks / ansible-hortonworks

Ansible playbooks for deploying Hortonworks Data Platform and DataFlow using Ambari Blueprints
Apache License 2.0
248 stars 253 forks source link

Live DataNodes issue when trying to install example HDP3-HA-3-master playbooks #94

Closed VidyasagarGudapati closed 5 years ago

VidyasagarGudapati commented 5 years ago

Hi,

I installed Ansible 2.7.1 on Mac and trying to install below playbooks from examples on Centos 7 static environment.

example-hdp3-ha-3-masters-with-druid-atlas-knox-log example-hdp3-ha-3-masters-with-accumulo

I get MapReduce2, YARN TIMELINE READER and Spark HistoryServer services failed to start with below issue.

{ "RemoteException": { "exception": "IOException", "javaClassName": "java.io.IOException", "message": "Failed to find datanode, suggest to check cluster health. excludeDatanodes=null" } }

When i open NameNode Web UI at IPADDR:50070 page, Live Nodes show 0 whereas in Ambari running DataNodes shows 1.

If anyone else faced this issue please help us solving it.

Thank you.

alexandruanghel commented 5 years ago

Hi, if NameNode UI shows 0 datanodes, but Ambari shows 1, then something happened between the DataNode and NameNode.

Ambari will only show if the DataNode process is up and running (which seems to be in your case), but that doesn't mean the DataNode connected to the NameNode and the filesystem is working (which is why the NameNode UI only shows 0 live nodes).

Please check the DataNode logs (under /var/log/hadoop/hdfs/) for further information about the possible issue...

VidyasagarGudapati commented 5 years ago

Hi,

Adding Host entries under /etc/hosts file resolved this issue for me.

Thanks

alexandruanghel commented 5 years ago

Great to hear that! Setting external_dns to no will also do that for you: https://github.com/hortonworks/ansible-hortonworks/blob/master/playbooks/group_vars/all#L26

thepg commented 5 years ago

Hello @alexandruanghel Please I have the same problem. I already set external_dns to no All the hosts are edited in /etc/hosts When I check the DataNode logs (under/var/log/hadoop/hdfs/) I found some errors like :

- Unable to connect to zookeeper.
- No live collector to send metrics to. Metrics to be sent will be discarded.