sergevs / ansible-cloudera-hadoop

ansible playbook to deploy cloudera hadoop components to the cluster
MIT License
51 stars 42 forks source link

hdfs dfs -copyFromlocal connection timeout error #11

Closed srjayep closed 5 years ago

srjayep commented 5 years ago

Hi ,

I am running into issues when hdfs dfs when installing hive on a node. Any help is appreciated.

Here is my hosts file info. [namenodes] master-01 ansible_user=vagrant master-02 ansible_user=vagrant

at least one is required

[datanodes] data-01 ansible_user=vagrant data-02 ansible_user=vagrant data-03 ansible_user=vagrant

at least one is required

job history server will be also configured on the 1st host

[yarnresourcemanager] master-01 ansible_user=vagrant master-02 ansible_user=vagrant

optional

can be required for other services

3 or 5 hosts is required if 2 namenodes configured

[zookeepernodes] master-01 ansible_user=vagrant master-02 ansible_user=vagrant utility-01 ansible_user=vagrant

optional

required if 2 namenodes configured

[journalnodes] master-01 ansible_user=vagrant master-02 ansible_user=vagrant utility-01 ansible_user=vagrant

optional

required if hivemetastore, oozie or hue configured

[postgresql] utility-01 ansible_user=vagrant

optional

required if impala-store-catalog configured

[hivemetastore] utility-01 ansible_user=vagrant

optional

[impala-store-catalog] utility-01 ansible_user=vagrant

optional

[hbasemaster]

optional

[solr]

optional

[spark]

optional

[oozie] utility-01 ansible_user=vagrant

optional

[kafka]

optional

[hue] edge-01 ansible_user=vagrant

optional. comment this out completely or fill in a host into [dashboard]

[dashboard]

[dashboard:children] namenodes

please do not edit the groups below

[hadoop:children] namenodes datanodes journalnodes yarnresourcemanager hivemetastore impala-store-catalog hbasemaster solr spark oozie hue

[java:children] hadoop kafka zookeepernodes

TASK [hivemetastore : copy hive-site.xml to hdfs] ***** changed: [utility-01] => (item=-mkdir -p /etc/hive/conf) failed: [utility-01] (item=-copyFromLocal -f /etc/cluster/hive/hive-site.xml /etc/hive/conf) => {"changed": true, "cmd": ["sudo", "-u", "hdfs", "hdfs", "dfs", "-copyFromLocal", "-f", "/etc/cluster/hive/hive-site.xml", "/etc/hive/conf"], "delta": "0:00:04.134512", "end": "2019-03-09 10:53:27.748157", "item": "-copyFromLocal -f /etc/cluster/hive/hive-site.xml /etc/hive/conf", "msg": "non-zero return code", "rc": 1, "start": "2019-03-09 10:53:23.613645", "stderr": "19/03/09 10:53:27 INFO hdfs.DFSClient: Exception in createBlockOutputStream\njava.net.ConnectException: Connection refused\n\tat sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)\n\tat sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)\n\tat org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)\n\tat org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)\n\tat org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1923)\n\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1666)\n\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1619)\n\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:771)\n19/03/09 10:53:27 WARN hdfs.DFSClient: Abandoning BP-519781906-192.168.56.100-1552128524558:blk_1073741825_1001\n19/03/09 10:53:27 WARN hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[192.168.56.1:50010,DS-24e7211c-66e2-4cf6-b056-a274d6cca4c8,DISK]\n19/03/09 10:53:27 WARN hdfs.DFSClient: DataStreamer Exception\norg.apache.hadoop.ipc.RemoteException(java.io.IOException): File /etc/hive/conf/hive-site.xml.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.\n\tat org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1626)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3351)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:683)\n\tat org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:214)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:495)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)\n\n\tat org.apache.hadoop.ipc.Client.call(Client.java:1502)\n\tat org.apache.hadoop.ipc.Client.call(Client.java:1439)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)\n\tat com.sun.proxy.$Proxy9.addBlock(Unknown Source)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:413)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)\n\tat org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)\n\tat com.sun.proxy.$Proxy10.addBlock(Unknown Source)\n\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1811)\n\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1607)\n\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:771)\ncopyFromLocal: File /etc/hive/conf/hive-site.xml.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.", "stderr_lines": ["19/03/09 10:53:27 INFO hdfs.DFSClient: Exception in createBlockOutputStream", "java.net.ConnectException: Connection refused", "\tat sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)", "\tat sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)", "\tat org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)", "\tat org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)", "\tat org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1923)", "\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1666)", "\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1619)", "\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:771)", "19/03/09 10:53:27 WARN hdfs.DFSClient: Abandoning BP-519781906-192.168.56.100-1552128524558:blk_1073741825_1001", "19/03/09 10:53:27 WARN hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[192.168.56.1:50010,DS-24e7211c-66e2-4cf6-b056-a274d6cca4c8,DISK]", "19/03/09 10:53:27 WARN hdfs.DFSClient: DataStreamer Exception", "org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /etc/hive/conf/hive-site.xml.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.", "\tat org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1626)", "\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3351)", "\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:683)", "\tat org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:214)", "\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:495)", "\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)", "\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)", "\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)", "\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)", "\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)", "\tat java.security.AccessController.doPrivileged(Native Method)", "\tat javax.security.auth.Subject.doAs(Subject.java:422)", "\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912)", "\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)", "", "\tat org.apache.hadoop.ipc.Client.call(Client.java:1502)", "\tat org.apache.hadoop.ipc.Client.call(Client.java:1439)", "\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)", "\tat com.sun.proxy.$Proxy9.addBlock(Unknown Source)", "\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:413)", "\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)", "\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)", "\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)", "\tat java.lang.reflect.Method.invoke(Method.java:498)", "\tat org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)", "\tat org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)", "\tat com.sun.proxy.$Proxy10.addBlock(Unknown Source)", "\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1811)", "\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1607)", "\tat org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:771)", "copyFromLocal: File /etc/hive/conf/hive-site.xml.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation."], "stdout": "", "stdout_lines": []}

sergevs commented 5 years ago

What's cloudera version you trying to deploy ? Ansible version ? What are hosts (master-0x, data-0x) specification (RAM, CPU, HDD ) ? Also please provide exact deploy command.

The issue looks weird: it seems there is problem with HDFS:

"copyFromLocal: File /etc/hive/conf/hive-site.xml.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation."], 

But playbook performs services tests after deploy (HDFS as well), so it's not clear how HDFS service tests passed which was deployed before hive. The only thing I can think that services was killed in progress of deploy due to lack of memory

srjayep commented 5 years ago

Hi Serge, I am installing 5.9 cloudera and ansible 2.7. I am doing minimum ram of 1 gig on all the nodes since this is running on virtualbox on my Mac as Dev environment.

What are hosts (master-0x, data-0x) specification (RAM, CPU, HDD ) ?

1 gig memory for master and data , and 2 CPUs and 40gig disk.

I have commented out a task in mapred test task throwing similar connection refused error.

command: sudo -Hu hdfs hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 1 1

However, test hdfs tasks passed (attached log for reference).

Here is Hive task failing.

  • name: copy hive-site.xml to hdfs tags: config command: sudo -u hdfs hdfs dfs {{ item }} with_items:
  • -mkdir -p /etc/hive/conf
  • -copyFromLocal -f {{ etc_folder }}/hive/hive-site.xml /etc/hive/conf

The above task -copyFromLocal is failing. I tried creating a test file with no contents works.

-bash-4.2$ touch test.out -bash-4.2$ hdfs dfs -copyFromLocal -f ./test.out /etc/hive/conf -bash-4.2$ pwd /var/lib/hadoop-hdfs -bash-4.2$ id uid=995(hdfs) gid=992(hdfs) groups=992(hdfs),993(hadoop) -bash-4.2$ hdfs dfs -ls /etc/hive/conf Found 1 items -rw-r--r-- 1 hdfs hadoop 0 2019-03-09 13:39 /etc/hive/conf/test.out

When I add contents to the file , it fails with same message connection error.

-bash-4.2$ cat test.out this is test file -bash-4.2$ hdfs dfs -copyFromLocal -f ./test.out /etc/hive/conf 19/03/09 13:45:29 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.ConnectException: Connection refused

attached ansbile play log. cloudera.log

sergevs commented 5 years ago

1 gig memory for master and data

That's definitely not enough for your setup. If you need development environment and have limited RAM resources, I would recommend:

  1. Don't use HA setup - services consumes RAM
  2. Don't try to emulate a real cluster - services consumes RAM, every VM consumes RAM for OS
  3. Run all required services on a single VM - it will optimize RAM resource As far I can remember, the all in one full stack of the playbook works on 10 Gb VM ( I tested that on my mac book :) ). As you don't need all services, I believe one VM with 6 Gb RAM should be fine for required services.