vozlt / nginx-module-vts

Nginx virtual host traffic status module
BSD 2-Clause "Simplified" License
3.24k stars 464 forks source link

Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) on VTS #283

Open zamazan4ik opened 1 year ago

zamazan4ik commented 1 year ago

Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. E.g. PGO helps with optimizing Envoyproxy. PGO results for other proxies like HAProxy you can be found in the repo above. According to the multiple tests, PGO can help with improving performance in many other cases. Since there are already some performance-oriented requests like https://github.com/vozlt/nginx-module-vts/issues/251 - I think trying to apply PGO to the VTS module can be a good thing.

I can suggest the following action points:

Maybe testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

Here are some examples of how PGO optimization is integrated in other projects:

u5surf commented 11 months ago

@zamazan4ik Thanks interesting suggestion. I consider that such optimize through this module might be limited. At first this module is a kind of nginx module, in short the build process is just nginx one, that might only the optimization of nginx. We should suggest such build process to nginx developers instead this module as formally.

u5surf commented 11 months ago

@zamazan4ik I could completely make sense what you suggested, it can be optimized the following approach. But I'm not sure how can be improved it such that process. https://stackoverflow.com/questions/13881292/what-information-does-gcc-profile-guided-optimization-pgo-collect-and-which-op We also can find the detail of this mechanisms on this paper. https://people.freebsd.org/~lstewart/articles/cpumemory.pdf section7.4.

In generally if it could be improve the performance which you expected, we could be written the following a building process as a tips in README instead of providing the optimized binary. Only we prefer to build such the way at users own risks.

compile with fprofile-generate

% pwd
/home/u5surf/nginx
% CC=gcc ./auto/configure --with-cc-opt='-fprofile-generate -fprofile-dir=./objs' --with-ld-opt='-lgcov' --add-module=../nginx-module-vts
% make

test a several cases

% pwd
/home/u5surf/nginx-module-vts
% sudo PATH=/home/u5surf/nginx/objs:$PATH prove -r t/000.display_html.t
...(during runtime it records coverage data into .gcda files)

recompile with fprofile-use

% pwd
/home/u5surf/nginx
% CC=gcc ./auto/configure --with-cc-opt='-fprofile-use -fprofile-dir=../nginx-module-vts/objs' --with-ld-opt='-lgcov' --add-module=../nginx-module-vts
% make
zamazan4ik commented 10 months ago

Excuse me for the so late response - holidays, you know :)

In generally if it could be improve the performance which you expected, we could be written the following a building process as a tips in README instead of providing the optimized binary. Only we prefer to build such the way at users own risks.

I agree with your suggestion. Having such documentation somewhere (like in the README file) is a good thing to the users to have.

I have the following suggestions for your documentation about PGO:

Here I gathered some PGO-related documentation examples: