✨[Feature] Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO)

zamazan4ik commented 9 months ago

Is your feature request related to a problem? Please describe.

Not a problem. An idea about how the TensorRT performance can be improved.

I checked Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) improvements on multiple projects. The results are available here. According to the tests, these optimizations can help with achieving better performance in many cases for many applications: compilers and interpreters, static analysis, databases, networking, etc. Since this, I think optimizing TensorRT (its C++ part) with PGO and PLO would be a good idea.

Describe the solution you'd like

I can suggest the following things:

Perform PGO benchmarks on TensorRT. If it shows improvements - add a note to the documentation about possible improvements in TensorRT performance with PGO.
Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize TensorRT according to their workloads.
Optimize pre-built TensorRT binaries

Additional context

As an additional optimization step after PGO, I can suggest Post-Link Optimization (PLO) with a tool like LLVM BOLT. I think it's still worth evaluating it only after the PGO integration into TensorRT.

Here I collected several PGO-related links (more PGO-related materials available at https://github.com/zamazan4ik/awesome-pgo/).

Examples of how PGO optimization is integrated into other projects:

Rustc: a CI script for the multi-stage build
GCC:
- Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
- A part in a "wonderful" configure script
Clang: Docs
Python:
- CPython: README
- Pyston: README
Go: Bash script
V8: Bazel flag
ChakraCore: Scripts
Chromium: Script
Firefox: Docs
- Thunderbird has PGO support too
PHP - Makefile command and old Centminmod scripts
MySQL: CMake script
YugabyteDB: GitHub commit
FoundationDB: Script
Zstd: Makefile
Foot: Scripts
Windows Terminal: GitHub PR
Pydantic-core: GitHub PR
file.d: GitHub PR
OceanBase: CMake flag

I have some examples of how PGO information looks in the documentation:

ClickHouse: https://clickhouse.com/docs/en/operations/optimizing-performance/profile-guided-optimization
Databend: https://databend.rs/doc/contributing/pgo
Vector: https://vector.dev/docs/administration/tuning/pgo/
Nebula: https://docs.nebula-graph.io/3.5.0/8.service-tuning/enable_autofdo_for_nebulagraph/
GCC: Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
Clang:
- https://llvm.org/docs/HowToBuildWithPGO.html
- https://llvm.org/docs/AdvancedBuilds.html
tsv-utils: https://github.com/eBay/tsv-utils/blob/master/docs/BuildingWithLTO.md

Regarding LLVM BOLT integration, I have the following examples:

Rustc:
- Rustc itself (GitHub PR)
- LLVM in Rustc (Reddit)
CPython: GitHub PR
YDB: GitHub comment
Clang:
LDC: GitHub comment
HHVM, Proxygen and others: Facebook paper
NodeJS: Blog
Chromium: Blog
MySQL, MongoDB, memcached, Verilator: Paper

narendasan commented 9 months ago

Do you think this is more geared towards TensorRT itself or the PyTorch extension? This might be more relevant to open in https://github.com/nvidia/pytorch

zamazan4ik commented 9 months ago

This might be more relevant to open in https://github.com/nvidia/pytorch

For this page, I get HTTP 404. Does it have some special access requirements or just the link is wrong?

narendasan commented 9 months ago

Sorry wrong url https://github.com/nvidia/tensorrt

pytorch / TensorRT

✨[Feature] Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #2511