openshift-psap / topsail

Test Orchestrator for Performance and Scalability of AI pLatforms
Apache License 2.0
11 stars 16 forks source link

[matrix_benchmarking] Remove the support for thresholds #560

Closed kpouget closed 1 month ago

openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please ask for approval from kpouget. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/openshift-psap/topsail/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
topsail-bot[bot] commented 1 month ago

Jenkins Job #1558

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 10 minutes 40 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run notebooks test test_ci
PR_POSITIONAL_ARGS=
PR_POSITIONAL_ARG_0=notebooks-perf-ci

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1558/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//002_test_ci/FAILURES/view/):

/logs/artifacts/002_test_ci/001__driver_notebooks__ods_ci_scale_test/FAILURE | [001__driver_notebooks__ods_ci_scale_test] ./run_toolbox.py from_config notebooks ods_ci_scale_test --extra { sut_cluster_kubeconfig: '/tmp/kubeconfig'} --> 2

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 month ago

Jenkins Job #1559

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 10 minutes 10 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run notebooks test test_ci
PR_POSITIONAL_ARGS=
PR_POSITIONAL_ARG_0=notebooks-perf-ci

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1559/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/001__driver_notebooks__ods_ci_scale_test/FAILURE | [001__driver_notebooks__ods_ci_scale_test] ./run_toolbox.py from_config notebooks ods_ci_scale_test --extra { sut_cluster_kubeconfig: '/tmp/kubeconfig'} --> 2
/logs/artifacts/000_test_ci/003__plots/FAILURE | ERROR | Stats 'report: LTS Documentation' does not exist. Skipping it.

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 month ago

Jenkins Job #1561

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 10 minutes 10 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run notebooks test test_ci
PR_POSITIONAL_ARGS=
PR_POSITIONAL_ARG_0=notebooks-perf-ci

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1561/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/001__driver_notebooks__ods_ci_scale_test/FAILURE | [001__driver_notebooks__ods_ci_scale_test] ./run_toolbox.py from_config notebooks ods_ci_scale_test --extra { sut_cluster_kubeconfig: '/tmp/kubeconfig'} --> 2

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 month ago

Jenkins Job #1562

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 04 minutes 20 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS=
PR_POSITIONAL_ARG_0=fine_tuning-perf-ci

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1562/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/004__prom_plots/FAILURE | An error happened during the results parsing, aborting the visualization (0_matbench_parse.log).
/logs/artifacts/000_test_ci/FAILURE | Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 417, in test
    failed = _run_test_and_visualize()
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 348, in _run_test_and_visualize
    generate_visualization(do_matbenchmarking, test_artifact_dir_p[0])
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 396, in generate_visualization
    raise exc
  File "/opt/topsail/src/projects/core/library/run.py", line 178, in run_and_catch
    fct(*args, **kwargs)

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 month ago

Jenkins Job #1563

:green_circle: Test of 'rhoai test test_ci' succeeded after 00 hours 04 minutes 36 seconds. :green_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS=
PR_POSITIONAL_ARG_0=fine_tuning-perf-ci

• Link to the Rebuild page.

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 month ago

Jenkins Job #1565

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 00 minutes 10 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run kserve test test_ci
PR_POSITIONAL_ARGS=
PR_POSITIONAL_ARG_0=kserve-perf-ci

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1565/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/000__plots/000__projects.kserve.visualizations.kserve-llm_plots/FAILURE | An error happened during the results parsing, aborting the visualization (0_matbench_parse.log).
RuntimeError: An error happened during the results parsing, aborting the visualization (0_matbench_parse.log).
Traceback (most recent call last):
  File "/opt/topsail/src/projects/kserve/testing/test.py", line 98, in test_ci
    test_e2e.test_ci()
  File "/opt/topsail/src/projects/kserve/testing/test_e2e.py", line 103, in test_ci
    prepare_kserve.update_serving_runtime_images(runtime)
  File "/opt/topsail/src/projects/kserve/testing/prepare_kserve.py", line 177, in update_serving_runtime_images
    run.run(TEMPLATE_CMD, capture_stdout=True)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 month ago

Jenkins Job #1564

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 10 minutes 08 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run notebooks test test_ci
PR_POSITIONAL_ARGS=
PR_POSITIONAL_ARG_0=notebooks-perf-ci

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1564/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/001__driver_notebooks__ods_ci_scale_test/FAILURE | [001__driver_notebooks__ods_ci_scale_test] ./run_toolbox.py from_config notebooks ods_ci_scale_test --extra { sut_cluster_kubeconfig: '/tmp/kubeconfig'} --> 2

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 month ago

Jenkins Job #1566

:green_circle: Test of 'rhoai test test_ci' succeeded after 00 hours 18 minutes 26 seconds. :green_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run kserve test test_ci
PR_POSITIONAL_ARGS=
PR_POSITIONAL_ARG_0=kserve-perf-ci

• Link to the Rebuild page.

[Test ran on the internal Perflab CI]

kpouget commented 1 month ago

test passed ❤️ , merging.