liusheng / liusheng.github.io

Liusheng's blog
http://liusheng.github.io
5 stars 1 forks source link

Ubuntu 18.04中源码配置安装KAE #26

Open liusheng opened 4 years ago

liusheng commented 4 years ago

这里记录一下从源码安装和配置KAE来测试起对于Hadoop中性能的影响。KAE相关组件包含3个代码仓库:

0. 环境信息

1. 安装依赖

3. 编译安装KAE

这一个要保证OpenSSL和libssl-dev正确安装

5. 验证

可以通过lzbench结合“silesia.tar”来测试配置了KAE的zlib的性能

...

ZLIB_FILES = zlib/adler32.o zlib/compress.o zlib/crc32.o zlib/deflate.o zlib/gzclose.o zlib/gzlib.o zlib/gzread.o

ZLIB_FILES += zlib/gzwrite.o zlib/infback.o zlib/inffast.o zlib/inflate.o zlib/inftrees.o zlib/trees.o

ZLIB_FILES += zlib/uncompr.o zlib/zutil.o

```shell
git clone https://github.com/inikep/lzbench
cd lzbench
sudo make

可以在作上述修改前后执行lzbench测试做前后对比

root@hadoop-kae:/opt/lzbench# ./lzbench -ezlib /opt/silesia.tar
lzbench 1.8 (64-bit Linux)   Assembled by P.Skibinski
Compressor name         Compress. Decompress. Compr. size  Ratio Filename
memcpy                   6307 MB/s  6266 MB/s   211947520 100.00 /opt/silesia.tar
zlib 1.2.11 -1             63 MB/s   216 MB/s    77259029  36.45 /opt/silesia.tar
zlib 1.2.11 -2             57 MB/s   222 MB/s    75002277  35.39 /opt/silesia.tar
zlib 1.2.11 -3             43 MB/s   228 MB/s    72967040  34.43 /opt/silesia.tar
zlib 1.2.11 -4             38 MB/s   225 MB/s    71002817  33.50 /opt/silesia.tar

root@hadoop-kae:/opt/lzbench# LD_LIBRARY_PATH=/usr/local/kaezip/lib/ ./lzbench -ezlib /opt/silesia.tar
lzbench 1.8 (64-bit Linux)   Assembled by P.Skibinski
Compressor name         Compress. Decompress. Compr. size  Ratio Filename
memcpy                   6026 MB/s  6509 MB/s   211947520 100.00 /opt/silesia.tar
zlib 1.2.11 -1           2283 MB/s  1473 MB/s    97072775  45.80 /opt/silesia.tar
zlib 1.2.11 -2           2281 MB/s  1469 MB/s    97076874  45.80 /opt/silesia.tar

在Hadoop集群中,可以在etc/hadoop/mapred-site.xml做如下配置来使能kaezip库

    <property>
      <name>mapreduce.map.env</name>
      <value>LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/kaezip/lib</value>
    </property>

6. Q&A

liusheng commented 4 years ago

问题记录: 根据Hadoop的文档,当在yarn-site.xml中配置yarn.nodemanager.resource.detect-hardware-capabilities为true,则hadoop集群会自动探测系统的cpu和memory资源,但是在ARM集群上面尝试,探测的结果不正确。

    <property>
      <description>Enable auto-detection of node capabilities such as
      memory and CPU.
      </description>
      <name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
      <value>true</value>
    </property>

实际根据nodemanager的启动日志,可以发现探测的结果为:

2020-09-17 16:32:11,913 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as hadoop-arm-kae-2:38733 with total resource of <memory:28629, vCores:1>

而实际上节点的配置为32vCPU+64G 内存。