triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
7.69k stars 1.42k forks source link

Evaluate Profile-Guided Optimization (PGO) and LLVM BOLT #6304

Open zamazan4ik opened 9 months ago

zamazan4ik commented 9 months ago

Hi!

Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects - the results are available here. Here you can find different applications from different domains that were accelerated with PGO: compilers, gRPC workloads, benchmark tools, databases, and much more. So that's why I think it's worth trying to apply PGO to Triton.

I can suggest the following things to do:

After PGO, I can suggest evaluating LLVM BOLT as an additional optimization step after PGO.

oandreeva-nv commented 9 months ago

Thank you, @zamazan4ik for your suggestions. I'll file a ticket for our team to investigate this proposal.