opendatahub-io / notebooks

Notebook images for ODH
Apache License 2.0
17 stars 59 forks source link

Split Intel AI Tools CPU and GPU images. #521

Closed sharvil10 closed 4 months ago

sharvil10 commented 5 months ago

This PR splits the Intel AI Tools images into CPU and GPU images for Intel TensorFlow and Intel PyTorch.

Description

This PR will split the Intel AI Tools CPU & GPU images into separate images. Th exact changes are described below.

  1. Split Intel PyTorch into Intel PyTorch CPU & XPU images.
  2. Split Intel TensorFlow into Intel TensorFlow CPU & XPU images.
  3. Change base image of Intel Jupyter images to be the ubi9 base images instead of the runtime images.

How Has This Been Tested?

This was tested by running make commands to build containers, deploy K8 resources and testing images.

Merge criteria:

openshift-ci[bot] commented 5 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign vaishnavihire for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/opendatahub-io/notebooks/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-ci[bot] commented 5 months ago

Hi @sharvil10. Thanks for your PR.

I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
sharvil10 commented 5 months ago

OpenShift Release PR: https://github.com/openshift/release/pull/51822

atheo89 commented 5 months ago

/ok-to-test

sharvil10 commented 5 months ago

Seems like it failed because the env variables were wrong for the jupyter images. I fixed it in this PR on OpenShift Release CI #51853.

sharvil10 commented 5 months ago

/retest

sharvil10 commented 5 months ago

Is it okay to request more resources(CPU and Memory) in the statefulset of jupyter containers to test them? Also, the tests seem to fail arbitrarily locally as well. It works sometimes and sometimes it doesn't with the same error as seen here.

sharvil10 commented 4 months ago

/retest

openshift-ci[bot] commented 4 months ago

@sharvil10: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/notebooks-e2e-tests 5c7c4d57896efa7327de0bb3a1d82a8386d4367b link true /test notebooks-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
harshad16 commented 4 months ago

@sharvil10 Is it okay to request more resources(CPU and Memory) in the statefulset of jupyter containers to test them? Also, the tests seem to fail arbitrarily locally as well. It works sometimes and sometimes it doesn't with the same error as seen here.

In the Opendatahub, user would have the option to pick different resource limits, if the question about the resource limits in the testings, we would have to look it up.

Was there a reason to close this PR ?