yankj12 / blog

技术研究、管理实践、其他的一些文章
MIT License
1 stars 2 forks source link

hadoop学习 #47

Open yankj12 opened 5 years ago

yankj12 commented 5 years ago

基础知识

Hadoop 2.x常用端口及查看方法

Hadoop 2.x常用端口及查看方法

常见问题

hadoop--java开发准备jar包

hadoop现在可以在windows的单机上进行学习,搭建工程的时候需要引入jar包,我使用的是2.8.0的版本,这些jar包如下

hadoop-2.8.0\share\hadoop\common  所有jar,
hadoop-2.8.0\share\hadoop\common\lib  所有jar,
hadoop-2.8.0\share\hadoop\hdfs  所有jar
hadoop-2.8.0\share\hadoop\mapreduce  所有jar
hadoop-2.8.0\share\hadoop\yarn  所有jar

还需要一个jar,netty-all-4.0.23.Final.jar,这个是学习hadoop权威指南中的范例中,有个显示FileStatus需要用到的。

需要使用这个包来来解决报错 ClassNotFoundException: io.netty.channel.EventLoopGroup

Unable to load native-hadoop library for your platform

2018-09-12 15:55:02,244 WARN  [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

说明这个报错信息只是warn,可以不管

Failed to locate the winutils binary in the hadoop binary path

这个报错信息是windows连接linux上的hadoop的时候会报错,也不影响

2018-09-12 15:55:01,629 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(374)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:356)
    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:371)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:364)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
    at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2806)
    at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2802)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2668)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
    at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:160)
    at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:157)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Unknown Source)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:157)
    at com.sinosoft.mapreduce.HDFSOperations.configure(HDFSOperations.java:24)
    at com.sinosoft.mapreduce.HDFSOperations.main(HDFSOperations.java:51)

按照如下办法可以解决 WIN7下运行hadoop程序报:Failed to locate the winutils binary in the hadoop binary path 问了度娘,找到了解决方案,很简单,如下:

  1. 下载winutils的windows版本 GitHub上,有人提供了winutils的windows的版本,项目地址是:https://github.com/srccodes/hadoop-common-2.2.0-bin,直接下载此项目的zip包,下载后是文件名是hadoop-common-2.2.0-bin-master.zip,随便解压到一个目录 可以可以去github上apache提供的https://github.com/apache/hadoop-common/releases
  2. 配置环境变量 增加用户变量HADOOP_HOME,值是下载的zip包解压的目录,然后在系统变量path里增加$HADOOP_HOME\bin 即可。   再次运行程序,正常执行。

java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray

Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V
    at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(Native Method)
    at org.apache.hadoop.util.NativeCrc32.calculateChunkedSumsByteArray(NativeCrc32.java:86)
    at org.apache.hadoop.util.DataChecksum.calculateChunkedSums(DataChecksum.java:430)
    at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:202)
    at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:163)
    at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:144)
    at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2254)
    at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2236)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
    at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1965)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1933)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1898)
    at com.sinosoft.mapreduce.HDFSOperations.copyFromLocalFile(HDFSOperations.java:55)
    at com.sinosoft.mapreduce.HDFSOperations.main(HDFSOperations.java:78)

解决办法 https://blog.csdn.net/sunshine920103/article/details/52431138

Hadoop(MapReduce)入门 使用Eclipse开发 https://blog.csdn.net/lonelysky/article/details/54955345

18/09/13 09:53:39 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/09/13 09:53:39 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://192.168.150.128:9000/user/yan/WordCount/output already exists
    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Unknown Source)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
    at com.sinosoft.mapreduce.wordcount.WordCount.main(WordCount.java:65)
18/09/13 09:57:25 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/09/13 09:57:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0(Ljava/lang/String;I)V
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0(Native Method)
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode(NativeIO.java:524)
    at org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:476)
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:529)
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:507)
    at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:305)
    at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:133)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:144)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Unknown Source)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
    at com.sinosoft.mapreduce.wordcount.WordCount.main(WordCount.java:65)

解决办法:

分析结果是由于本地的hadoop是windows下编译的2.4.1版本,代码用了hadoop2.7.2的jar造成的。

解决方法(亲测)
重新下载hadoop2.7.2在windows下编译,本人在win7 64位下编译。编译配置有不少陷阱花了不少时间。
 编译后将hadoop目录覆盖原来的%HADOOP_HOME%环境变量,同时注意PATH中应有%HADOOP_HOME%/bin,因为windows本地允许需要新编译出来的hadoop.dll,PATH需要能找到。
重新运行程序,程序正常执行完成。

有两种方式:

18/09/13 11:09:40 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/09/13 11:09:40 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/09/13 11:09:57 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/09/13 11:09:57 WARN mapreduce.JobResourceUploader: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
18/09/13 11:09:57 INFO input.FileInputFormat: Total input paths to process : 1
18/09/13 11:09:58 INFO mapreduce.JobSubmitter: number of splits:1
18/09/13 11:09:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local17088442_0001
18/09/13 11:09:58 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/09/13 11:09:58 INFO mapreduce.Job: Running job: job_local17088442_0001
18/09/13 11:09:58 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/09/13 11:09:58 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/09/13 11:09:58 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/09/13 11:09:58 WARN mapred.LocalJobRunner: job_local17088442_0001
org.apache.hadoop.security.AccessControlException: Permission denied: user=Yan, access=WRITE, inode="/user/yan/WordCount/output/_temporary/0":root:supergroup:drwxr-xr-x
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1687)
    at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3894)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:983)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
    at java.lang.reflect.Constructor.newInstance(Unknown Source)
    at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
    at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
    at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3002)
    at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2970)
    at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1047)
    at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1043)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1043)
    at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1036)
    at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1877)
    at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.setupJob(FileOutputCommitter.java:305)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:511)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=Yan, access=WRITE, inode="/user/yan/WordCount/output/_temporary/0":root:supergroup:drwxr-xr-x
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1687)
    at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3894)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:983)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

    at org.apache.hadoop.ipc.Client.call(Client.java:1475)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy9.mkdirs(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:558)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy10.mkdirs(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3000)
    ... 9 more
18/09/13 11:09:59 INFO mapreduce.Job: Job job_local17088442_0001 running in uber mode : false
18/09/13 11:09:59 INFO mapreduce.Job:  map 0% reduce 0%
18/09/13 11:09:59 INFO mapreduce.Job: Job job_local17088442_0001 failed with state FAILED due to: NA
18/09/13 11:09:59 INFO mapreduce.Job: Counters: 0

参考 从 "org.apache.hadoop.security.AccessControlException:Permission denied: user=..." 看Hadoop 的用户登陆认证

方法一: 在hdfs中修改文件夹权限 方法二: 在hdfs中关闭权限校验 方法三: 在windows环境变量中配置hadoop用户

18/09/13 11:30:24 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/09/13 11:30:24 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/09/13 11:30:41 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/09/13 11:30:41 WARN mapreduce.JobResourceUploader: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
18/09/13 11:30:41 INFO input.FileInputFormat: Total input paths to process : 1
18/09/13 11:30:41 INFO mapreduce.JobSubmitter: number of splits:1
18/09/13 11:30:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1644049495_0001
18/09/13 11:30:41 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/09/13 11:30:41 INFO mapreduce.Job: Running job: job_local1644049495_0001
18/09/13 11:30:41 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/09/13 11:30:41 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/09/13 11:30:41 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/09/13 11:30:41 INFO mapred.LocalJobRunner: Waiting for map tasks
18/09/13 11:30:41 INFO mapred.LocalJobRunner: Starting task: attempt_local1644049495_0001_m_000000_0
18/09/13 11:30:41 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/09/13 11:30:41 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
18/09/13 11:30:42 INFO mapreduce.Job: Job job_local1644049495_0001 running in uber mode : false
18/09/13 11:30:42 INFO mapreduce.Job:  map 0% reduce 0%
18/09/13 11:30:43 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@64416f3a
18/09/13 11:30:43 INFO mapred.MapTask: Processing split: hdfs://192.168.150.128:9000/user/yan/WordCount/input/WordCountInput.txt:0+1710
18/09/13 11:30:44 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/09/13 11:30:44 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/09/13 11:30:44 INFO mapred.MapTask: soft limit at 83886080
18/09/13 11:30:44 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/09/13 11:30:44 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/09/13 11:30:44 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/09/13 11:31:05 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3436)
    at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:777)
    at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:694)
    at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:656)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
    at java.io.DataInputStream.read(Unknown Source)
    at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:59)
    at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
    at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
    at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:91)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:144)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:184)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
18/09/13 11:31:05 WARN hdfs.DFSClient: Failed to connect to /172.18.0.3:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection timed out: no further information
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3436)
    at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:777)
    at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:694)
    at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:656)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
    at java.io.DataInputStream.read(Unknown Source)
    at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:59)
    at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
    at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
    at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:91)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:144)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:184)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
18/09/13 11:31:26 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3436)
    at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:777)
    at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:694)
    at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:656)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
    at java.io.DataInputStream.read(Unknown Source)
    at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:59)
    at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
    at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
    at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:91)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:144)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:184)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
18/09/13 11:31:26 WARN hdfs.DFSClient: Failed to connect to /172.18.0.4:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection timed out: no further information
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3436)
    at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:777)
    at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:694)
    at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:656)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
    at java.io.DataInputStream.read(Unknown Source)
    at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:59)
    at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
    at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
    at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:91)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:144)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:184)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
18/09/13 11:31:26 INFO hdfs.DFSClient: Could not obtain BP-431089505-172.17.0.2-1465730089024:blk_1073741841_1017 from any node: java.io.IOException: No live nodes contain block BP-431089505-172.17.0.2-1465730089024:blk_1073741841_1017 after checking nodes = [DatanodeInfoWithStorage[172.18.0.3:50010,DS-e15197c4-00fe-484c-9d53-ef4a0778f45f,DISK], DatanodeInfoWithStorage[172.18.0.4:50010,DS-2ea8f13f-a6ef-47ce-8e5a-e06b58bc2ff6,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[172.18.0.3:50010,DS-e15197c4-00fe-484c-9d53-ef4a0778f45f,DISK] DatanodeInfoWithStorage[172.18.0.4:50010,DS-2ea8f13f-a6ef-47ce-8e5a-e06b58bc2ff6,DISK] Dead nodes:  DatanodeInfoWithStorage[172.18.0.3:50010,DS-e15197c4-00fe-484c-9d53-ef4a0778f45f,DISK] DatanodeInfoWithStorage[172.18.0.4:50010,DS-2ea8f13f-a6ef-47ce-8e5a-e06b58bc2ff6,DISK]. Will get new block locations from namenode and retry...
18/09/13 11:31:26 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 869.7488406888957 msec.

问题原因(已经验证): 在eclipse中开发map-reduce应用的时候,需要和hadoop的多个节点进行通信,但是因为是虚拟机中运行的docker,docker多个容器的端口不会暴露给windows(即:docker中多个容器的端口会暴露给宿主,但是不会暴露给宿主外面),就导致windows认为hadoop的节点挂掉了,map-reduce运行报错 解决办法: 方法一: 不使用docker安装,直接启动多个虚拟机,进行集群部署。这种方式可以参考官网配置 方法二: 创建一个大内存的虚拟机,开启图形化界面,在这个虚拟机中使用docker部署,并且在这个虚拟机中使用eclipse开发map-reduce应用(亲测可行)