Open mcharanrm opened 3 weeks ago
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign ccamacho for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Jenkins Job #1585
:red_circle: Test of 'rhoai test test_ci' failed after 06 hours 58 minutes 49 seconds. :red_circle:
• Link to the test results.
• Link to the reports index.
Test configuration:
# RHOAI: run kserve test test_ci
PR_POSITIONAL_ARGS: cpt_single_model_gating
PR_POSITIONAL_ARG_0: kserve-perf-ci
PR_POSITIONAL_ARG_1: cpt_single_model_gating
• Link to the Rebuild page.
[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1585/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//002_test_ci/FAILURES/view/):
/logs/artifacts/002_test_ci/003__plots/000__projects.kserve.visualizations.kserve-llm_plots/FAILURE | An error happened during the visualization post-processing ... (regression detected)
RuntimeError: An error happened during the visualization post-processing ... (regression detected)
Traceback (most recent call last):
File "/opt/topsail/src/projects/kserve/testing/test.py", line 237, in generate_plots
visualize.generate_from_dir(str(results_dirname))
File "/opt/topsail/src/projects/matrix_benchmarking/library/visualize.py", line 73, in wrapper
fct(*args, **kwargs)
File "/opt/topsail/src/projects/matrix_benchmarking/library/visualize.py", line 464, in generate_from_dir
generate_visualizations(results_dirname, generate_lts=generate_lts)
File "/opt/topsail/src/projects/matrix_benchmarking/library/visualize.py", line 73, in wrapper
[...]
[Test ran on the internal Perflab CI]
Jenkins Job #1586
:red_circle: Test of 'rhoai test test_ci' failed after 07 hours 34 minutes 32 seconds. :red_circle:
• Link to the test results.
• Link to the reports index.
Test configuration:
# RHOAI: run kserve test test_ci
PR_POSITIONAL_ARGS: vllm_cpt_single_model_gating
PR_POSITIONAL_ARG_0: kserve-perf-ci
PR_POSITIONAL_ARG_1: vllm_cpt_single_model_gating
• Link to the Rebuild page.
[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1586/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//002_test_ci/FAILURES/view/):
/logs/artifacts/002_test_ci/003__plots/000__projects.kserve.visualizations.kserve-llm_plots/FAILURE | An error happened during the visualization post-processing ... (regression detected)
RuntimeError: An error happened during the visualization post-processing ... (regression detected)
Traceback (most recent call last):
File "/opt/topsail/src/projects/kserve/testing/test.py", line 237, in generate_plots
visualize.generate_from_dir(str(results_dirname))
File "/opt/topsail/src/projects/matrix_benchmarking/library/visualize.py", line 73, in wrapper
fct(*args, **kwargs)
File "/opt/topsail/src/projects/matrix_benchmarking/library/visualize.py", line 464, in generate_from_dir
generate_visualizations(results_dirname, generate_lts=generate_lts)
File "/opt/topsail/src/projects/matrix_benchmarking/library/visualize.py", line 73, in wrapper
[...]
[Test ran on the internal Perflab CI]
Both tests have been completed successfully but it's status is marked as failed because few KPIs didn't pass in the regression analysis.
Topsail performs regression analysis for llm-load-test KPIs as-well-as for resource utilization KPIs. Looking at the regression-analysis report, the number of KPIs didn't pass are very less, (3/507), (1/572) and (5/60), which is insignificant. The KPIs didn't pass are neither appearing from same LLM model nor from the same KPI across different models.
The values recorded in the KPIs that didn't pass the regression analysis seem like potential outliers to me. We should acknowledge it safely and consider that this is not a blocker for 2.15.0 RC1 model-serving performance since there is no trace of potential regression identified.
Jenkins Job #1589
:red_circle: Test of 'rhoai test test_ci' failed after 08 hours 04 minutes 25 seconds. :red_circle:
• Link to the test results.
• Link to the reports index.
Test configuration:
# RHOAI: run kserve test test_ci
PR_POSITIONAL_ARGS: vllm_cpt_single_model_gating
PR_POSITIONAL_ARG_0: kserve-perf-ci
PR_POSITIONAL_ARG_1: vllm_cpt_single_model_gating
• Link to the Rebuild page.
[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1589/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//002_test_ci/FAILURES/view/):
/logs/artifacts/002_test_ci/003__plots/000__projects.kserve.visualizations.kserve-llm_plots/FAILURE | An error happened during the visualization post-processing ... (regression detected)
RuntimeError: An error happened during the visualization post-processing ... (regression detected)
Traceback (most recent call last):
File "/opt/topsail/src/projects/kserve/testing/test.py", line 237, in generate_plots
visualize.generate_from_dir(str(results_dirname))
File "/opt/topsail/src/projects/matrix_benchmarking/library/visualize.py", line 73, in wrapper
fct(*args, **kwargs)
File "/opt/topsail/src/projects/matrix_benchmarking/library/visualize.py", line 464, in generate_from_dir
generate_visualizations(results_dirname, generate_lts=generate_lts)
File "/opt/topsail/src/projects/matrix_benchmarking/library/visualize.py", line 73, in wrapper
[...]
[Test ran on the internal Perflab CI]
Jenkins Job #1591
:red_circle: Test of 'rhoai test test_ci' failed after 06 hours 55 minutes 52 seconds. :red_circle:
• Link to the test results.
• Link to the reports index.
Test configuration:
# RHOAI: run kserve test test_ci
PR_POSITIONAL_ARGS: cpt_single_model_gating
PR_POSITIONAL_ARG_0: kserve-perf-ci
PR_POSITIONAL_ARG_1: cpt_single_model_gating
• Link to the Rebuild page.
[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1591/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//002_test_ci/FAILURES/view/):
/logs/artifacts/002_test_ci/003__plots/000__projects.kserve.visualizations.kserve-llm_plots/FAILURE | An error happened during the visualization post-processing ... (regression detected)
RuntimeError: An error happened during the visualization post-processing ... (regression detected)
Traceback (most recent call last):
File "/opt/topsail/src/projects/kserve/testing/test.py", line 237, in generate_plots
visualize.generate_from_dir(str(results_dirname))
File "/opt/topsail/src/projects/matrix_benchmarking/library/visualize.py", line 73, in wrapper
fct(*args, **kwargs)
File "/opt/topsail/src/projects/matrix_benchmarking/library/visualize.py", line 464, in generate_from_dir
generate_visualizations(results_dirname, generate_lts=generate_lts)
File "/opt/topsail/src/projects/matrix_benchmarking/library/visualize.py", line 73, in wrapper
[...]
[Test ran on the internal Perflab CI]
Updated
tag
andversion
fields inkserve/config.yaml
file for OpenShift AI 2.15.0 RC1 model serving performance validation.I will use the exiting CI presents
"cpt_single_model_gating"
and"vllm_cpt_single_model_gating"
to deploy LLMs using "TGIS standalone servingruntime" & "vLLM ServingRuntime" when launching e2e CPT tests through topsail from middleware CI.