openshift-psap / topsail

Test Orchestrator for Performance and Scalability of AI pLatforms
Apache License 2.0
11 stars 16 forks source link

[fine-tuning] Integrate Ray benchmarking as an alternative fine-tuning job #580

Closed kpouget closed 1 week ago

openshift-ci[bot] commented 1 week ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please ask for approval from kpouget. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/openshift-psap/topsail/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
topsail-bot[bot] commented 1 week ago

Jenkins Job #1596

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 03 minutes 09 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1596/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/002__test_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/002__test_fine_tuning" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1}"' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 132, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1597

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 00 minutes 04 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1597/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/FAILURE | Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 368, in test
    failed = _run_test_and_visualize()
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 262, in _run_test_and_visualize
    raise RuntimeError(msg)
RuntimeError: RHOAI not installed, cluster not prepared for fine-tuning

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1598

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 04 minutes 35 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1598/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/002__test_fine_tuning/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1} --> 2
/logs/artifacts/000_test_ci/002__test_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/002__test_fine_tuning" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 132, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1599

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 00 minutes 04 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1599/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/FAILURE | Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 368, in test
    failed = _run_test_and_visualize()
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 262, in _run_test_and_visualize
    raise RuntimeError(msg)
RuntimeError: RHOAI not installed, cluster not prepared for fine-tuning

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1600

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 04 minutes 22 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1600/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/002__test_fine_tuning/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1} --> 2
/logs/artifacts/000_test_ci/002__test_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/002__test_fine_tuning" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 132, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1601

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 05 minutes 35 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1601/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/002__test_fine_tuning/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1} --> 2
/logs/artifacts/000_test_ci/002__test_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/002__test_fine_tuning" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 132, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1602

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 25 minutes 02 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1602/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/002__test_fine_tuning/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1} --> 2
/logs/artifacts/000_test_ci/002__test_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/002__test_fine_tuning" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 132, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1605

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 00 minutes 05 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1605/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/FAILURE | Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 368, in test
    failed = _run_test_and_visualize()
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 262, in _run_test_and_visualize
    raise RuntimeError(msg)
RuntimeError: RHOAI not installed, cluster not prepared for fine-tuning

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1606

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 10 minutes 01 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1606/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//001_test_ci/FAILURES/view/):

/logs/artifacts/001_test_ci/002__test_fine_tuning/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1} --> 2
/logs/artifacts/001_test_ci/002__test_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/001_test_ci/002__test_fine_tuning" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 132, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1607

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 09 minutes 39 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1607/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/002__test_fine_tuning/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1} --> 2
/logs/artifacts/000_test_ci/002__test_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/002__test_fine_tuning" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 132, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1608

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 00 minutes 06 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1608/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/FAILURE | Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 368, in test
    failed = _run_test_and_visualize()
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 288, in _run_test_and_visualize
    failed = _run_test(test_artifact_dir_p, test_override_values)
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 87, in _run_test
    dataset_source = sources[test_settings["dataset_name"]]
KeyError: None

[Test ran on the internal Perflab CI]

kpouget commented 1 week ago

/test rhoai-light fine_tuning ray_bench

openshift-ci[bot] commented 1 week ago

@kpouget: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/rhoai-light 47f946ee133920f1a69498e38aba4277355dd23e link true /test rhoai-light

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
topsail-bot[bot] commented 1 week ago

Jenkins Job #1610

:red_circle: Test of 'rhoai test prepare_ci' failed after 00 hours 05 minutes 18 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test prepare_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1610/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_prepare_ci/FAILURES/view/):

/logs/artifacts/000_prepare_ci/001__prepare2/000__prepare_namespace/FAILURE | UnboundLocalError: local variable 'model' referenced before assignment
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/prepare_finetuning.py", line 224, in prepare_namespace
    download_data_sources(test_settings)
  File "/opt/topsail/src/projects/fine_tuning/testing/prepare_finetuning.py", line 113, in download_data_sources
    elif isinstance(model, list):
UnboundLocalError: local variable 'model' referenced before assignment

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1611

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 00 minutes 07 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1611/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/000__prepare_namespace/FAILURE | KeyError: 'model_name'
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/prepare_finetuning.py", line 224, in prepare_namespace
    download_data_sources(test_settings)
  File "/opt/topsail/src/projects/fine_tuning/testing/prepare_finetuning.py", line 94, in download_data_sources
    model_name = test_settings["model_name"]
KeyError: 'model_name'

/logs/artifacts/000_test_ci/FAILURE | Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 369, in test

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1612

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 05 minutes 10 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1612/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/002__test_fine_tuning/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'gpu': 1} --> 2
/logs/artifacts/000_test_ci/002__test_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/002__test_fine_tuning" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'gpu': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 133, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1613

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 02 minutes 35 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1613/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/001__ray__ray-benchmark/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'gpu': 1} --> 2
/logs/artifacts/000_test_ci/001__ray__ray-benchmark/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/001__ray__ray-benchmark" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'gpu': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 139, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1614

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 08 minutes 04 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1614/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/FAILURE | Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 375, in test
    failed = _run_test_and_visualize()
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 266, in _run_test_and_visualize
    if not prepare_rhoai_mod.is_rhoai_installed():
  File "/opt/topsail/src/projects/rhods/library/prepare_rhoai.py", line 33, in is_rhoai_installed
    installed_csv_cmd = run.run(f"oc get csv -loperators.coreos.com/{RHODS_OPERATOR_MANIFEST_NAME}.{RHODS_NAMESPACE}"
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1615

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 07 minutes 47 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1615/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/001__ray__ray-benchmark/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'gpu': 1} --> 2
/logs/artifacts/000_test_ci/001__ray__ray-benchmark/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/001__ray__ray-benchmark" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'gpu': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 139, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1616

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 07 minutes 16 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1616/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/001__ray__ray-benchmark/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'gpu': 1} --> 2
/logs/artifacts/000_test_ci/001__ray__ray-benchmark/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/001__ray__ray-benchmark" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'gpu': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 139, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1617

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 07 minutes 27 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1617/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/001__ray__ray-benchmark/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'gpu': 1, 'hyper_parameters': {'num_samples': 10}} --> 2
/logs/artifacts/000_test_ci/001__ray__ray-benchmark/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/001__ray__ray-benchmark" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'gpu': 1, 'hyper_parameters': {'num_samples': 10}}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 139, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1618

:red_circle: Test of 'rhoai test test_ci' failed after 00 hours 02 minutes 17 seconds. :red_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

[Failure indicator](https://ci.app-svc-perf.corp.redhat.com/job/ExternalTeams/job/RHODS/job/topsail/1618/artifact/run/f23-h33-000-6018r.rdu2.scalelab.redhat.com//000_test_ci/FAILURES/view/):

/logs/artifacts/000_test_ci/001__ray__ray-benchmark/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'gpu': 1, 'hyper_parameters': {'num_samples': 10}} --> 2
/logs/artifacts/000_test_ci/001__ray__ray-benchmark/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/001__ray__ray-benchmark" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'gpu': 1, 'hyper_parameters': {'num_samples': 10}}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 139, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1620

:green_circle: Test of 'rhoai test test_ci' succeeded after 00 hours 06 minutes 47 seconds. :green_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

[Test ran on the internal Perflab CI]

topsail-bot[bot] commented 1 week ago

Jenkins Job #1621

:green_circle: Test of 'rhoai test test_ci' succeeded after 00 hours 07 minutes 32 seconds. :green_circle:

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ''
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci

• Link to the Rebuild page.

[Test ran on the internal Perflab CI]

kpouget commented 1 week ago

tests passed ❤️ , merging