For those using GenAI-Perf's "--help" option, it takes 1-3 seconds to generate the help menu due to all of the heavy imports (especially the transformers library). This pull request defers those imports until they are needed. This should not impact normal runtime (see below). It speeds up the "--help call" by up to 16x for the average case, 24x for the first-time call, and 47x for the non-parallelized case. See the runtime profiles below.
For those using GenAI-Perf's "--help" option, it takes 1-3 seconds to generate the help menu due to all of the heavy imports (especially the transformers library). This pull request defers those imports until they are needed. This should not impact normal runtime (see below). It speeds up the "--help call" by up to 16x for the average case, 24x for the first-time call, and 47x for the non-parallelized case. See the runtime profiles below.
"--help" option
Before changes (1-3s):
After changes (62.3ms):
For completeness, the
-h
option:Full run
Before changes (39.0s):![image](https://github.com/triton-inference-server/client/assets/58150256/014b8147-deca-4da0-8c76-803889d0657e)
After changes (38.1s):![image](https://github.com/triton-inference-server/client/assets/58150256/d48e6bd3-14b6-4ecb-900c-2fdaeff5bf72)
Passing unit tests![image](https://github.com/triton-inference-server/client/assets/58150256/63a59c81-c0d7-4d55-90f3-797b4de1b859)