red-hat-data-services / odh-deployer

The odh-deployer image creates a custom resource for the image in operator image in odh-operator-allinone
Apache License 2.0
5 stars 42 forks source link

Update servingruntime ootb configuration #340

Closed lucferbux closed 1 year ago

lucferbux commented 1 year ago

Description

Get latest changes to downstream, displaying tensorflow in OVVM and adding a new GPU field.

lucferbux commented 1 year ago

@Xaenalt @VaishnaviHire @andrewballantyne Can we review this before feature freeze please, in that case this can go to 1.28 without opening another pr. thanks in advance!

Xaenalt commented 1 year ago

Oh, this seems like a good place to ask, apparently despite the docs saying this isn't supported, apparently we can set target_device: HETERO:NVIDIA,CPU, so it'll fall back if it fails to compile onto the GPU, I'm not entirely sure if this will work if the Nvidia device is absent, but it's worth asking if we want to look into that

lucferbux commented 1 year ago

Oh, this seems like a good place to ask, apparently despite the docs saying this isn't supported, apparently we can set target_device: HETERO:NVIDIA,CPU, so it'll fall back if it fails to compile onto the GPU, I'm not entirely sure if this will work if the Nvidia device is absent, but it's worth asking if we want to look into that

Hi @Xaenalt I don't fully understand this, what do you mean? In case they deploy this serving runtime and nvidia is absent it will fail? Is that it?

Xaenalt commented 1 year ago

So, I need to test to make sure what happens if Nvidia is absent, but if I'm reading their docs correctly (and what their devs said correctly), it'll run some parts on the CPU and some on the Nvidia GPU, that should increase compatibility, but it's not clear to me what'll happen if one device is absent, I'd assume that it'd just run it all on the CPU, but I've been burned by that kind of assumption before xD

lucferbux commented 1 year ago

So, I need to test to make sure what happens if Nvidia is absent, but if I'm reading their docs correctly (and what their devs said correctly), it'll run some parts on the CPU and some on the Nvidia GPU, that should increase compatibility, but it's not clear to me what'll happen if one device is absent, I'd assume that it'd just run it all on the CPU, but I've been burned by that kind of assumption before xD

But does this require something UI related? or maybe adding something in the official docs? Not sure if I need to change something based on your comments.

lucferbux commented 1 year ago

@Xaenalt added https://github.com/red-hat-data-services/odh-deployer/pull/340/commits/fe33221e4e034359469c211c92355d1f035922e8 to support tensorflow, let me know if this is fine!

lucferbux commented 1 year ago

@anishasthana @VaishnaviHire @LaVLaS could you check this out? I'm aiming to merge this before feature freeze, thanks in advance!

lucferbux commented 1 year ago

@tarukumar I think it needs qe approval too, can you please take a look?

openshift-ci[bot] commented 1 year ago

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: andrewballantyne, Xaenalt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/red-hat-data-services/odh-deployer/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment