triton-inference-server / client

Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
BSD 3-Clause "New" or "Revised" License
551 stars 227 forks source link

Extend genai perf plots to compare across multiple runs #635

Closed lkomali closed 5 months ago

lkomali commented 5 months ago

distribution_of_input_tokens_to_generated_tokens request_latency time_to_first_token_vs_number_of_input_tokens

nv-hwoo commented 5 months ago

@debermudez Logging example:

$ genai-perf compare --files ...
2024-05-08 12:52 [INFO] genai_perf.plots.plot_config_parser:196 - Creating initial YAML configuration file to compare/config.yaml
2024-05-08 12:52 [INFO] genai_perf.plots.plot_config_parser:51 - Generating plot configurations by parsing compare/config.yaml. This may take few seconds.
2024-05-08 12:52 [INFO] genai_perf.plots.plot_manager:51 - Generating 'Time to First Token' plot
2024-05-08 12:52 [INFO] genai_perf.plots.plot_manager:51 - Generating 'Request Latency' plot
2024-05-08 12:52 [INFO] genai_perf.plots.plot_manager:51 - Generating 'Distribution of Input Tokens to Generated Tokens' plot
2024-05-08 12:52 [INFO] genai_perf.plots.plot_manager:51 - Generating 'Time to First Token vs Number of Input Tokens' plot
2024-05-08 12:52 [INFO] genai_perf.plots.plot_manager:51 - Generating 'Token-to-Token Latency vs Output Token Position' plot