opendatahub-io / notebooks

Notebook images for ODH
Apache License 2.0
17 stars 55 forks source link

Switch base image to UBI for AMD rocm install #620

Closed harshad16 closed 1 month ago

harshad16 commented 1 month ago

Description

Switch base image to UBI for AMD rocm install Related-to: https://issues.redhat.com/browse/RHOAIENG-7501

How Has This Been Tested?

1. build the base `podman build -t amd-base .`
2. check the necessary bits are available in amd-base , with `rpm -qa`

Merge criteria:

openshift-ci[bot] commented 1 month ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

jstourac commented 1 month ago

Looks like we need to update the prow configuration then too:

jiridanek commented 1 month ago

Also Makefile should be updated (in this PR), to reference ubi9- and not c9s- dir.

jiridanek commented 1 month ago

We're still running out of disk space on one of the images, as @caponetto observed yesterday.

jiridanek commented 1 month ago

ci/prow/notebook-amd-c9s-python-3-9-pr-image-mirror — Job failed.                     BaseSHA:fecd10c13ce13d66b57498abc48976f74121dd63

so openshift-ci needs to be updated

jiridanek commented 1 month ago

@jstourac so, approving in GitHub UI gives BOTH LGTM and approved, now that we are approvers. Gotta be careful.

caponetto commented 1 month ago

We're still running out of disk space on one of the images, as @caponetto observed yesterday.

At first, I thought that was happening only during Trivy scan because it has to copy stuff around. But then I saw that CI is running out of disk space even during the build step for amd-jupyter-pytorch-ubi9-python-3.9 (examples are this PR and some builds triggered after a merge). Considering that amd-jupyter-pytorch-ubi9-python-3.9 is ~60 GB uncompressed, the CI is operating on its limit. If we ever need to add new things to this image, we'll probably face these storage issues more often.

caponetto commented 1 month ago

Apparently, there are more people concerned about rocm+pytorch size (see https://github.com/ROCm/ROCm-docker/issues/120)

atheo89 commented 1 month ago

To fix the ci please check this PR: https://github.com/opendatahub-io/notebooks/pull/627 explains the instructions on how you can update the notebook matrix

openshift-ci[bot] commented 1 month ago

@harshad16: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/notebooks-ubi8-e2e-tests b8a1d3fc41869aea6e9a20e3e8e990d7d82ec06f link true /test notebooks-ubi8-e2e-tests
ci/prow/notebook-amd-c9s-python-3-9-pr-image-mirror 3cfe039432c8d28ea3077569a98ea6333d62014c link true /test notebook-amd-c9s-python-3-9-pr-image-mirror
ci/prow/notebook-amd-jupyter-minimal-c9s-python-3-9-pr-image-mirror 3cfe039432c8d28ea3077569a98ea6333d62014c link true /test notebook-amd-jupyter-minimal-c9s-python-3-9-pr-image-mirror
ci/prow/amd-runtimes-ubi9-e2e-tests 3cfe039432c8d28ea3077569a98ea6333d62014c link true /test amd-runtimes-ubi9-e2e-tests
ci/prow/runtime-rocm-pytorch-ubi9-python-3-9-pr-image-mirror 3cfe039432c8d28ea3077569a98ea6333d62014c link true /test runtime-rocm-pytorch-ubi9-python-3-9-pr-image-mirror

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
atheo89 commented 1 month ago

/lgtm /approve

/override ci/prow/images /override /ci/prow/notebook-rocm-jupyter-pyt-ubi9-python-3-9-pr-image-mirror /override ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-9-pr-image-mirror /override ci/prow/rocm-notebooks-e2e-tests

openshift-ci[bot] commented 1 month ago

@atheo89: /override requires failed status contexts, check run or a prowjob name to operate on. The following unknown contexts/checkruns were given:

Only the following failed contexts/checkruns were expected:

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to [this](https://github.com/opendatahub-io/notebooks/pull/620#issuecomment-2244698939): >/lgtm >/approve > >/override ci/prow/images >/override /ci/prow/notebook-rocm-jupyter-pyt-ubi9-python-3-9-pr-image-mirror >/override ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-9-pr-image-mirror >/override ci/prow/rocm-notebooks-e2e-tests > > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: atheo89, jiridanek, jstourac

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/opendatahub-io/notebooks/blob/main/OWNERS)~~ [atheo89,jiridanek,jstourac] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
jiridanek commented 1 month ago

let me try

/override "build (rocm-jupyter-pytorch-ubi9-python-3.9) / build" /override ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-9-pr-image-mirror /override ci/prow/rocm-notebooks-e2e-tests /override ci/prow/notebook-rocm-jupyter-pyt-ubi9-python-3-9-pr-image-mirror /override ci/prow/images

openshift-ci[bot] commented 1 month ago

@jiridanek: Overrode contexts on behalf of jiridanek: build (rocm-jupyter-pytorch-ubi9-python-3.9) / build, ci/prow/images, ci/prow/notebook-rocm-jupyter-pyt-ubi9-python-3-9-pr-image-mirror, ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-9-pr-image-mirror, ci/prow/rocm-notebooks-e2e-tests

In response to [this](https://github.com/opendatahub-io/notebooks/pull/620#issuecomment-2244713919): >let me try > >/override "build (rocm-jupyter-pytorch-ubi9-python-3.9) / build" >/override ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-9-pr-image-mirror >/override ci/prow/rocm-notebooks-e2e-tests >/override ci/prow/notebook-rocm-jupyter-pyt-ubi9-python-3-9-pr-image-mirror >/override ci/prow/images Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
jiridanek commented 1 month ago

/override "build (rocm-jupyter-pytorch-ubi9-python-3.9) / build"

openshift-ci[bot] commented 1 month ago

@jiridanek: /override requires failed status contexts, check run or a prowjob name to operate on. The following unknown contexts/checkruns were given:

Only the following failed contexts/checkruns were expected:

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to [this](https://github.com/opendatahub-io/notebooks/pull/620#issuecomment-2244720681): >/override "build (rocm-jupyter-pytorch-ubi9-python-3.9) / build" Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.