Open ysh329 opened 5 years ago
体积的裁剪的两项重点:RTTI的作用消除和Protobuf依赖的移除
下面是各个优化步骤详情:
step1: visibility hidden + data & function section gather and gc -section + strip
当我们依赖静态库,编译和链接选项加如上所示后,应用程序变小,但是查看符号表等,发现一些库中的函数依然存在,于是经过一些时间的排查,当添加上-fvisibility=hidden
库大小: 3.4 MB
参考:https://blog.csdn.net/Swallow_he/article/details/87373345
step2: remove glog and shutdown local log system
库大小:3.3 MB
step3:改造 op register(屏蔽无关代码)
库大小:3.3 MB (效果很小)
step4:摒弃之前静态库连接方式,改为 obj 连接 DSO
库大小: 3.1 MB
step5:删除替换所有异常 throw 操作, 添加 -fno-exceptions
库大小: 2.8 MB
step6:去掉默认rtti系统,替换为自实现 rtti 结构(新增 rtti.h / cc)
无法编译,因为目前版本依赖源码编译 protobuf,protobuf 本身调用typeid ,需要整体替换protobuf内部rtti 工作量大,意义不大 因为后续pr会删除protobuf依赖, 目前protobuf 替换版本pr 还没有合入
C/C++ visibility - zhu4674548的专栏 - CSDN博客 https://blog.csdn.net/zhu4674548/article/details/83904604
Visibility - GCC Wiki https://gcc.gnu.org/wiki/Visibility
pts.blog: How to make smaller C and C++ binaries http://ptspts.blogspot.com/2013/12/how-to-make-smaller-c-and-c-binaries.html
Android NDK: How to Reduce Binaries Size - The Algolia Blog - Algolia Blog https://blog.algolia.com/android-ndk-how-to-reduce-libs-size/
c++ - How to optimize size of shared library? - Stack Overflow https://stackoverflow.com/questions/8021470/how-to-optimize-size-of-shared-library
Code optimization for size in C++ - CodeProject https://www.codeproject.com/Questions/1231114/Code-optimization-for-size-in-Cplusplus
At work we have a custom tool that parses out the .map files that visual studio generates. This lets us see code and data sizes from the executable.
Like KulSeran says: The best way that I know of to examine sizes is to generate and examine a 'map file' - this is basically a way of determining where functions, static data, and resources live in your executable file, and/or where they will be loaded at runtime.
If you want to further analyze code, you need to examine the assembly instructions (an assembly listing file can be generated by many compilers at build time).
You can also just open the EXE in a hex editor and see if there's any obviously wasted space (usually the only time this can be seen at a glance is for large static arrays which are initialized to zero).
Some things that generally reduce code size for x86 programs are:
Set optimization settings as aggressive as possible, EXCEPT for loop unrolling, if you can adjust that separately.
Experimenting with function inlining settings. Counterintuitively, sometimes turning it on will SAVE space.
Make sure dead code stripping is enabled.
Remove all debugging symbols.
Disable exceptions and RTTI.
If you use lots of templated containers, try to use the fewest amount of unique data types as possible in those containers.
Try the 'omit frame pointer' option - this skips the 'push ebp; mov ebp, esp' at the start of every function, saving you 3 bytes, and between 1 and 3 bytes at the end depending on whether the compiler uses the 'LEAVE' instruction or manually MOVs and pops EBP. It also frees up the EBP register for general-purpose use as a 7th general purpose register, which can decrease the amount of memory or instructions needed when shuffling data around.
Compile in 32-bit rather than 64-bit mode unless you plan to perform LOTS of 64-bit integer math, or use more than 3GB of RAM. 64-bit instructions that use the REX prefix tend to be wasteful.
Avoid large static arrays if there is an alternative.
Change alignment and struct packing to eliminate padding wherever possible (this can result in speed issues or alignment exceptions depending on what you're doing).
For RISC:
Advanced Search - GameDev.net https://www.gamedev.net/search/
Compact build advice · opencv/opencv Wiki https://github.com/opencv/opencv/wiki/Compact-build-advice
Using the GNU Compiler Collection (GCC): Optimize Options https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
Reducing Executable Size - WxWiki https://wiki.wxwidgets.org/Reducing_Executable_Size
Tutorial: 4k Intros in Linux https://int21.de/linux4k/
c - How to create 4KB Linux binaries that render a 3D scene? - Stack Overflow https://stackoverflow.com/questions/10551665/how-to-create-4kb-linux-binaries-that-render-a-3d-scene/10552160#10552160
上两节中,我们讲的比较浅:
strip
命令;gflags
和glog
的其它库的做法,protobuf
库的替代方法(如TensorFlow Lite用FlatBuffers代替代替,tiny-dnn用cereal代替,cereal是一个header-only的序列化库)等等。这次将会以Compact build advice · opencv/opencv Wiki为主要大纲,对其翻译的同时,再补充些内容。其实是我在搜相关内容的时候,刚好OpenCV的这个Wiki页面提到的,而且说得很具体,以结果为导向先来看一个对OpenCV库裁剪优化的表格:
注:二进制文件大小取决于库配置(library configuration),而库配置又取决于构建环境的平台,以及该平台中安装的库。这里的比较没有CUDA,OpenCL和IPP(Intel Integrated Performance Primitives (Intel IPP),是Intel的多媒体和数学计算库,基于SSE、AVX等指令进行优化)的x86_64版本。
从上面表中可以看出,以下重要的裁剪结论:
#6
与#1
到#5
对比),前三项中单独某一个或两个裁剪效果微弱,当三个裁剪时效果倍增;#8
);-Os
编译器标志而不是-O3
(#10
和#11
)将二进制文件压缩到基本大小的三分之一。下面就会以上表中的优化裁剪手段,展开。不过文中会穿插很多基础内容,可选择跳过。
在Linux上基于轻量级OpenCV构建应用
OpenCV库,可以两种方式构建:动态
dynamic
(共享库,shared libraries
)和静态static
(归档 ,archives
)。大多数平台上的默认模式是动态的(dynamic),要切换到另一种模式,可以在OpenCV的
CMake
中关闭BUILD_SHARED_LIBS
的cmake
选项。虽然CMake用的是add_library
来区别shared/static/module
,但其本质,是调用CMake中指定的编译器如gcc
。下面补充关于用gcc编译链接静态/动态库的内容。用gcc编译链接静态/动态库
>> ## [用gcc编译链接静态/动态库](https://renenyffenegger.ch/notes/development/languages/C-C-plus-plus/GCC/create-libraries/index) > 构建并链接静态库: > ```shell > # 1. 构建object文件 > # main.c调用了add.c和answer.c > $ gcc -c src/main.c -o bin/main.o > > # 为静态库创建object文件 (without -fPIC) > $ gcc -c src/tq84/add.c -o bin/static/add.o > $ gcc -c src/tq84/answer.c -o bin/static/answer.o > > # 用于动态库的object文件需要编译成位置无关代码(position independent)code (-fPIC) > # 因为动态库被用于地址空间的任何位置(mapped to any position in the address space) > $ gcc -c -fPIC src/tq84/add.c -o bin/shared/add.o > $ gcc -c -fPIC src/tq84/answer.c -o bin/shared/answer.o > > # 2.1 构建静态库 > # 为静态库创建object文件(without -fPIC),位置有关代码 > $ gcc -c src/tq84/answer.c -o bin/static/answer.o > > # 静态库是一系列object文件的集合拷贝到一个单独的文件中,并以后缀.a结尾 > # 静态文件通过archiver(归档)命令(ar)生成 > # 下面将`add.o`和`answer.o`生成静态库`libtq84.a` > $ ar rcs bin/static/libtq84.a bin/static/add.o bin/static/answer.o > > # 2.2 静态链接(Link statically) > # 用静态库链接main.o > # -L:表示要链接的库在哪里可以被找到(这需要手动指定,不是通用的方式) > # -l:表示要链接的库的名字,假定这个库的名称以`lib`作为前缀起始,以`.o`作为后缀结束 > $ gcc bin/main.o -Lbin/static -ltq84 -o bin/statically-linked > > # 创建好的可执行`bin/statically-linked`不依赖于任何其他对象文件或库。 可在没有`.a`或`.o`文件的情况下分发。也可在shell上执行,如下所示: > $ ./bin/statically-linked > > # 3.1 创建动态库(即共享库) > # 我们创建一个没有动态库名`SONAME`的动态库。使用GCC的`-shared`标志创建共享库,并使用后缀`.so`而不是`.a`命名最终的文件。 > $ gcc -shared bin/shared/add.o bin/shared/answer.o -o bin/shared/libtq84.so > > # 为了创建共享库,必须生成与位置无关的代码,即使用`-fPIC`标志来编译c文件(注意在前面生成main.o时,使用了`-fPIC`标志) > # 如果在没有-fPIC的情况下创建目标文件(例如在生成静态目标文件时),那么会有类似的如下报错: > # /usr/bin/ld: bin/tq84.o: relocation R_X86_64_PC32 against symbol `gSummand' can not be used when making a shared object; recompile with -fPIC > > # 3.2 动态方式连接动态库(Link dynamically with the shared library) > # 请注意与在2.2中静态库链接时的相似性:静态链接时是-Lbin/static,现在动态链接则是-Lbin/shared > # 注意顺序: > # -ltq84-shared要在main.c后面 > $ gcc bin/main.o -Lbin/shared -ltq84 -o bin/use-shared-library > ``` > 参考: > - [Creating a shared and static library with the gnu compiler (gcc)](https://renenyffenegger.ch/notes/development/languages/C-C-plus-plus/GCC/create-libraries/index) > - [Shared libraries with GCC on Linux - Cprogramming.com](https://www.cprogramming.com/tutorial/shared-libraries-linux-gcc.html)然而,动静态库有各自的使用场景、优缺点。
动态库(Shared libraries,即
-DBUILD_SHARED_LIBS=ON
时):libpng
)静态库(Static libraries,即
-DBUILD_SHARED_LIBS=OFF
):wikipedia
上的Static_library
下面将会根据GCC和Clang的编译选项,在静态构建时对
OpenCV
库链接的应用程序二进制文件的影响。基本裁剪
-fvisibility=hidden, -fvisibility-inlines=hidden
__attribute__ ((visibility ("hidden")))
visibility
用于设置动态链接库中函数的可见性,将变量或函数设置为hidden
,则该符号仅在本DSO
中可见,对外不可见,其他库无法找到此函数实现,即可见性隐藏,与此同时也减小了体积。GNU的GCC WIKI详细说明了C++ Visibility的优点,这里我提炼以下四点:DSO
完全安全地删除它们。这为内联器提供了更大的自由度,内联器不再需要保持“以防万一”的入口点(This gives greater latitude especially to the inliner which no longer needs to keep an entry point around "just in case");DSO
体积减少5-20%
。ELF的导出符号表格式非常耗费空间,当大量使用模板时,完整的错位符号名称占的空间巨多,平均约1000字节。C ++模板会产生大量符号,有的C++库甚至可轻松超过30000个符号,即大约5-6MB;更详细关于动态库介绍,可以看这篇文章:How To Write Shared Libraries | Ulrich Drepper, Dec 10, 2011,也是GNU WIKI推荐的。但是怎么使用这个函数可见性的特性呢?
在需要暴露API或者公开接口所对应的头文件中,对要暴露的的结构、类和函数声明前加上
__attribute __((visibility(“default”)))
,下面会给出一段代码,方便你直接复制粘贴,嘻嘻嘻。然后,在GCC每次编译源码的过程中加入-fvisibility = hidden
。在输出的DSO
上使用nm -C -D
命令,比较函数隐藏与否产生的差异,是否符合预期。参考: