opendatahub-io / notebooks

Notebook images for ODH
Apache License 2.0
17 stars 55 forks source link

Inject rocm runtimes to runtime-images folder #628

Closed atheo89 closed 3 weeks ago

atheo89 commented 1 month ago

Related to: https://issues.redhat.com/browse/RHOAIENG-9680 Depends on: https://github.com/openshift/release/pull/54567

Description

Add rocm runtimes to runtime-images folder

Merge criteria:

atheo89 commented 3 weeks ago

Opened a fix PR on OCP CI to resolve the naming for rocm runtimes on 2024a build branch. https://github.com/openshift/release/pull/55441 Once this get merged I can proceed to fill the correct image hashes to this one.

atheo89 commented 3 weeks ago

This PR is ready for review

openshift-ci[bot] commented 3 weeks ago

@atheo89: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/amd-runtimes-ubi9-e2e-tests 3cf53fc7839c407b1d5e42f068b0ab3b0a9e1019 link true /test amd-runtimes-ubi9-e2e-tests
ci/prow/notebook-rocm-ubi9-python-3-9-pr-image-mirror 3cf53fc7839c407b1d5e42f068b0ab3b0a9e1019 link true /test notebook-rocm-ubi9-python-3-9-pr-image-mirror
ci/prow/runtime-rocm-pytorch-ubi9-python-3-9-pr-image-mirror 3cf53fc7839c407b1d5e42f068b0ab3b0a9e1019 link true /test runtime-rocm-pytorch-ubi9-python-3-9-pr-image-mirror
ci/prow/runtime-rocm-tensorflow-ubi9-python-3-9-pr-image-mirror 3cf53fc7839c407b1d5e42f068b0ab3b0a9e1019 link true /test runtime-rocm-tensorflow-ubi9-python-3-9-pr-image-mirror
ci/prow/rocm-runtimes-ubi9-e2e-tests 3cf53fc7839c407b1d5e42f068b0ab3b0a9e1019 link true /test rocm-runtimes-ubi9-e2e-tests
ci/prow/runtimes-ubi8-e2e-tests 3cf53fc7839c407b1d5e42f068b0ab3b0a9e1019 link true /test runtimes-ubi8-e2e-tests
ci/prow/runtimes-ubi9-e2e-tests 3cf53fc7839c407b1d5e42f068b0ab3b0a9e1019 link true /test runtimes-ubi9-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
jiridanek commented 3 weeks ago

Habana seems to be having problems

Installing collected packages: typing-extensions, triton, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, nvidia-cusparse-cu12, nvidia-cudnn-cu12, lightning-utilities, nvidia-cusolver-cu12, lightning-habana, torch, torchmetrics, pytorch-lightning, lightning
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.5.0
    Uninstalling typing_extensions-4.5.0:
      Successfully uninstalled typing_extensions-4.5.0
  Attempting uninstall: torch
    Found existing installation: torch 2.1.0a0+gitf8b6084
    Uninstalling torch-2.1.0a0+gitf8b6084:
      Successfully uninstalled torch-2.1.0a0+gitf8b6084
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-cpu 2.12.1 requires tensorboard<2.13,>=2.12, but you have tensorboard 2.11.2 which is incompatible.
tensorflow-cpu 2.12.1 requires typing-extensions<4.6.0,>=3.6.6, but you have typing-extensions 4.12.2 which is incompatible.
kfp 2.7.0 requires protobuf<5,>=4.21.1, but you have protobuf 3.20.3 which is incompatible.
kfp-kubernetes 1.2.0 requires protobuf<5,>=4.21.1, but you have protobuf 3.20.3 which is incompatible.

I know that's not caused by this change, but it's a problem nonetheless https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/opendatahub-io_notebooks/628/pull-ci-opendatahub-io-notebooks-main-images/1822987134265462784

jiridanek commented 3 weeks ago

/lgtm

the images as displayed on quay.io look to be the correct ones

jstourac commented 3 weeks ago

/lgtm

openshift-ci[bot] commented 3 weeks ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: harshad16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/opendatahub-io/notebooks/blob/main/OWNERS)~~ [harshad16] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
harshad16 commented 3 weeks ago

/override ci/prow/images /override ci/prow/notebooks-ubi9-e2e-tests /override ci/prow/rocm-notebooks-e2e-tests

openshift-ci[bot] commented 3 weeks ago

@harshad16: Overrode contexts on behalf of harshad16: ci/prow/images, ci/prow/notebooks-ubi9-e2e-tests, ci/prow/rocm-notebooks-e2e-tests

In response to [this](https://github.com/opendatahub-io/notebooks/pull/628#issuecomment-2286119164): >/override ci/prow/images >/override ci/prow/notebooks-ubi9-e2e-tests >/override ci/prow/rocm-notebooks-e2e-tests Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.