sergevs / ansible-cloudera-hadoop

ansible playbook to deploy cloudera hadoop components to the cluster
MIT License
51 stars 42 forks source link

hdfs dfs -copyFromlocal connection timeout error #11

Closed srjayep closed 5 years ago

srjayep commented 5 years ago

Hi ,

I am running into issues when hdfs dfs when installing hive on a node. Any help is appreciated.

Here is my hosts file info. [namenodes] master-01 ansible_user=vagrant master-02 ansible_user=vagrant

at least one is required

[datanodes] data-01 ansible_user=vagrant data-02 ansible_user=vagrant data-03 ansible_user=vagrant

at least one is required

job history server will be also configured on the 1st host

[yarnresourcemanager] master-01 ansible_user=vagrant master-02 ansible_user=vagrant


can be required for other services

3 or 5 hosts is required if 2 namenodes configured

[zookeepernodes] master-01 ansible_user=vagrant master-02 ansible_user=vagrant utility-01 ansible_user=vagrant


required if 2 namenodes configured

[journalnodes] master-01 ansible_user=vagrant master-02 ansible_user=vagrant utility-01 ansible_user=vagrant


required if hivemetastore, oozie or hue configured

[postgresql] utility-01 ansible_user=vagrant


required if impala-store-catalog configured

[hivemetastore] utility-01 ansible_user=vagrant


[impala-store-catalog] utility-01 ansible_user=vagrant








[oozie] utility-01 ansible_user=vagrant




[hue] edge-01 ansible_user=vagrant

optional. comment this out completely or fill in a host into [dashboard]


[dashboard:children] namenodes

please do not edit the groups below

[hadoop:children] namenodes datanodes journalnodes yarnresourcemanager hivemetastore impala-store-catalog hbasemaster solr spark oozie hue

[java:children] hadoop kafka zookeepernodes

TASK [hivemetastore : copy hive-site.xml to hdfs] ***** changed: [utility-01] => (item=-mkdir -p /etc/hive/conf) failed: [utility-01] (item=-copyFromLocal -f /etc/cluster/hive/hive-site.xml /etc/hive/conf) => {"changed": true, "cmd": ["sudo", "-u", "hdfs", "hdfs", "dfs", "-copyFromLocal", "-f", "/etc/cluster/hive/hive-site.xml", "/etc/hive/conf"], "delta": "0:00:04.134512", "end": "2019-03-09 10:53:27.748157", "item": "-copyFromLocal -f /etc/cluster/hive/hive-site.xml /etc/hive/conf", "msg": "non-zero return code", "rc": 1, "start": "2019-03-09 10:53:23.613645", "stderr": "19/03/09 10:53:27 INFO hdfs.DFSClient: Exception in createBlockOutputStream\ Connection refused\n\tat Method)\n\tat\n\tat\n\tat\n\tat org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(\n\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(\n\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(\n\tat org.apache.hadoop.hdfs.DFSOutputStream$\n19/03/09 10:53:27 WARN hdfs.DFSClient: Abandoning BP-519781906-\n19/03/09 10:53:27 WARN hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[,DS-24e7211c-66e2-4cf6-b056-a274d6cca4c8,DISK]\n19/03/09 10:53:27 WARN hdfs.DFSClient: DataStreamer Exception\norg.apache.hadoop.ipc.RemoteException( File /etc/hive/conf/hive-site.xml.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.\n\tat org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(\n\tat org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$\n\tat org.apache.hadoop.ipc.RPC$\n\tat org.apache.hadoop.ipc.Server$Handler$\n\tat org.apache.hadoop.ipc.Server$Handler$\n\tat Method)\n\tat\n\tat\n\tat org.apache.hadoop.ipc.Server$\n\n\tat\n\tat\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(\n\tat com.sun.proxy.$Proxy9.addBlock(Unknown Source)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(\n\tat java.lang.reflect.Method.invoke(\n\tat\n\tat\n\tat com.sun.proxy.$Proxy10.addBlock(Unknown Source)\n\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(\n\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(\n\tat org.apache.hadoop.hdfs.DFSOutputStream$\ncopyFromLocal: File /etc/hive/conf/hive-site.xml.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.", "stderr_lines": ["19/03/09 10:53:27 INFO hdfs.DFSClient: Exception in createBlockOutputStream", " Connection refused", "\tat Method)", "\tat", "\tat", "\tat", "\tat org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(", "\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(", "\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(", "\tat org.apache.hadoop.hdfs.DFSOutputStream$", "19/03/09 10:53:27 WARN hdfs.DFSClient: Abandoning BP-519781906-", "19/03/09 10:53:27 WARN hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[,DS-24e7211c-66e2-4cf6-b056-a274d6cca4c8,DISK]", "19/03/09 10:53:27 WARN hdfs.DFSClient: DataStreamer Exception", "org.apache.hadoop.ipc.RemoteException( File /etc/hive/conf/hive-site.xml.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.", "\tat org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(", "\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(", "\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(", "\tat org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(", "\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(", "\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(", "\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$", "\tat org.apache.hadoop.ipc.RPC$", "\tat org.apache.hadoop.ipc.Server$Handler$", "\tat org.apache.hadoop.ipc.Server$Handler$", "\tat Method)", "\tat", "\tat", "\tat org.apache.hadoop.ipc.Server$", "", "\tat", "\tat", "\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(", "\tat com.sun.proxy.$Proxy9.addBlock(Unknown Source)", "\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(", "\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)", "\tat sun.reflect.NativeMethodAccessorImpl.invoke(", "\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(", "\tat java.lang.reflect.Method.invoke(", "\tat", "\tat", "\tat com.sun.proxy.$Proxy10.addBlock(Unknown Source)", "\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(", "\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(", "\tat org.apache.hadoop.hdfs.DFSOutputStream$", "copyFromLocal: File /etc/hive/conf/hive-site.xml.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation."], "stdout": "", "stdout_lines": []}

sergevs commented 5 years ago

What's cloudera version you trying to deploy ? Ansible version ? What are hosts (master-0x, data-0x) specification (RAM, CPU, HDD ) ? Also please provide exact deploy command.

The issue looks weird: it seems there is problem with HDFS:

"copyFromLocal: File /etc/hive/conf/hive-site.xml.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation."], 

But playbook performs services tests after deploy (HDFS as well), so it's not clear how HDFS service tests passed which was deployed before hive. The only thing I can think that services was killed in progress of deploy due to lack of memory

srjayep commented 5 years ago

Hi Serge, I am installing 5.9 cloudera and ansible 2.7. I am doing minimum ram of 1 gig on all the nodes since this is running on virtualbox on my Mac as Dev environment.

What are hosts (master-0x, data-0x) specification (RAM, CPU, HDD ) ?

1 gig memory for master and data , and 2 CPUs and 40gig disk.

I have commented out a task in mapred test task throwing similar connection refused error.

command: sudo -Hu hdfs hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 1 1

However, test hdfs tasks passed (attached log for reference).

Here is Hive task failing.

  • name: copy hive-site.xml to hdfs tags: config command: sudo -u hdfs hdfs dfs {{ item }} with_items:
  • -mkdir -p /etc/hive/conf
  • -copyFromLocal -f {{ etc_folder }}/hive/hive-site.xml /etc/hive/conf

The above task -copyFromLocal is failing. I tried creating a test file with no contents works.

-bash-4.2$ touch test.out -bash-4.2$ hdfs dfs -copyFromLocal -f ./test.out /etc/hive/conf -bash-4.2$ pwd /var/lib/hadoop-hdfs -bash-4.2$ id uid=995(hdfs) gid=992(hdfs) groups=992(hdfs),993(hadoop) -bash-4.2$ hdfs dfs -ls /etc/hive/conf Found 1 items -rw-r--r-- 1 hdfs hadoop 0 2019-03-09 13:39 /etc/hive/conf/test.out

When I add contents to the file , it fails with same message connection error.

-bash-4.2$ cat test.out this is test file -bash-4.2$ hdfs dfs -copyFromLocal -f ./test.out /etc/hive/conf 19/03/09 13:45:29 INFO hdfs.DFSClient: Exception in createBlockOutputStream Connection refused

attached ansbile play log. cloudera.log

sergevs commented 5 years ago

1 gig memory for master and data

That's definitely not enough for your setup. If you need development environment and have limited RAM resources, I would recommend:

  1. Don't use HA setup - services consumes RAM
  2. Don't try to emulate a real cluster - services consumes RAM, every VM consumes RAM for OS
  3. Run all required services on a single VM - it will optimize RAM resource As far I can remember, the all in one full stack of the playbook works on 10 Gb VM ( I tested that on my mac book :) ). As you don't need all services, I believe one VM with 6 Gb RAM should be fine for required services.