liusheng / liusheng.github.io

Liusheng's blog
http://liusheng.github.io
5 stars 1 forks source link

MapReduce 流程学习笔记 #27

Closed liusheng closed 3 years ago

liusheng commented 4 years ago
  1. HDFS中一个block对应于一个MapReduce流程中的一个split,对应于一个reduce流程。 例如,我们有下面的的数据作为输入数据:
    hadoop@hadoop-kae:/opt/hadoop-3.4.0-SNAPSHOT$ hadoop fs -ls -h /HiBench/Wordcount/Input/
    Found 11 items
    -rw-r--r--   1 hadoop supergroup          0 2020-09-08 16:02 /HiBench/Wordcount/Input/_SUCCESS
    -rw-r--r--   1 hadoop supergroup    310.7 M 2020-09-08 16:02 /HiBench/Wordcount/Input/part-m-00000
    -rw-r--r--   1 hadoop supergroup    310.7 M 2020-09-08 16:02 /HiBench/Wordcount/Input/part-m-00001
    -rw-r--r--   1 hadoop supergroup    310.7 M 2020-09-08 16:02 /HiBench/Wordcount/Input/part-m-00002
    -rw-r--r--   1 hadoop supergroup    310.7 M 2020-09-08 16:02 /HiBench/Wordcount/Input/part-m-00003
    -rw-r--r--   1 hadoop supergroup    310.7 M 2020-09-08 16:02 /HiBench/Wordcount/Input/part-m-00004
    -rw-r--r--   1 hadoop supergroup    310.7 M 2020-09-08 16:02 /HiBench/Wordcount/Input/part-m-00005
    -rw-r--r--   1 hadoop supergroup    310.7 M 2020-09-08 16:02 /HiBench/Wordcount/Input/part-m-00006
    -rw-r--r--   1 hadoop supergroup    310.7 M 2020-09-08 16:02 /HiBench/Wordcount/Input/part-m-00007
    -rw-r--r--   1 hadoop supergroup    310.7 M 2020-09-08 16:02 /HiBench/Wordcount/Input/part-m-00008
    -rw-r--r--   1 hadoop supergroup    310.7 M 2020-09-08 16:02 /HiBench/Wordcount/Input/part-m-00009

    由于HDFS中每一个block的大小默认为128MB(可以通过配置dfs.blocksize项配置),由于上述输入的文件每一个大小为310M,所以每一个文件在HDFS中存3个block(128M2<310M<128M3)。也就是处理上述数据,一共需要30个map任务。