mesosphere-backup / hdfs-deprecated

[DEPRECATED] This project is deprecated. It will be archived on December 1, 2017.
Apache License 2.0
147 stars 52 forks source link

Not able to deploy hdfs app in DCOS due to name resolution errors between datanode and namenode #256

Closed nigamashish closed 8 years ago

nigamashish commented 8 years ago

Run this command - dcos package install hdfs

It deploys name nodes but data nodes do not show up in the cluster. Looking at the data node task mesos logs, I see these errors like this --

ERROR datanode.DataNode: Initialization failed for Block pool BP-1713926224-10.0.0.186-1462585594024 (Datanode Uuid null) service to namenode2.hdfs.mesos/10.0.0.187:50071 Datanode denied communication with namenode because hostname cannot be resolved (ip=10.0.0.183, hostname=10.0.0.183): DatanodeRegistration(0.0.0.0, datanodeUuid=4240027a-63a6-413d-b2f1-7cc156b9aea0, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-dea5b94a-0f35-4158-a249-ef2c1b09d787;nsid=1554246596;c=0)

Some more details about DCOS env - 5 nodes in the cluster All nodes EC2 - Centos 7

--> systemctl | grep dcos dcos-ddt.service loaded active running Diagnostics: DC/OS Distributed Diagnostics Tool dcos-epmd.service loaded active running Erlang Port Mapping Daemon: DC/OS Erlang Port Mapping Daemon dcos-mesos-slave.service loaded active running Mesos Agent: DC/OS Mesos Agent Service dcos-minuteman.service loaded active running Layer 4 Load Balancer: DC/OS Layer 4 Load Balancing Service dcos-spartan.service loaded active running DNS Dispatcher: An RFC5625 Compliant DNS Forwarder dcos.target loaded active active dcos.target dcos-gen-resolvconf.timer loaded active waiting Generate resolv.conf Timer: Periodically update systemd-resolved for mesos-dns dcos-logrotate.timer loaded active waiting Logrotate Timer: Timer to trigger every 2 minutes dcos-spartan-watchdog.timer loaded active waiting DNS Dispatcher Watchdog Timer: Periodically check is Spartan is working

nigamashish commented 8 years ago

i had wrong resolver when i created DCOS cluster. This issue does not exist anymore