triton-inference-server / client

Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
BSD 3-Clause "New" or "Revised" License
521 stars 225 forks source link

Speed up GenAi-Perf's help call #669

Closed dyastremsky closed 1 month ago

dyastremsky commented 1 month ago

For those using GenAI-Perf's "--help" option, it takes 1-3 seconds to generate the help menu due to all of the heavy imports (especially the transformers library). This pull request defers those imports until they are needed. This should not impact normal runtime (see below). It speeds up the "--help call" by up to 16x for the average case, 24x for the first-time call, and 47x for the non-parallelized case. See the runtime profiles below.

"--help" option

Before changes (1-3s):

image

After changes (62.3ms):

image

For completeness, the -h option:

image

Full run

Before changes (39.0s): image

After changes (38.1s): image

Passing unit tests image

dyastremsky commented 1 month ago

This is really cool. Nice work!

Dang, you're fast! I was just going to post this. Thanks! 🥳