nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
1 stars 0 forks source link

Move Lenovo A100SXM4GPU node from OpenShift Test cluster to Prod Cluster #552

Open joachimweyl opened 2 months ago

joachimweyl commented 2 months ago

Motivation

Able to use the GPU Node in production once testing is done.

Completion Criteria

GPU Node available in Production

Description

Completion dates

Desired - 2024-05-08 Required - 2024-06-31

joachimweyl commented 1 month ago

Blocked until @dystewart gives the go-ahead that they are done with their RHOAI testing.

joachimweyl commented 4 weeks ago

@dystewart is GPU testing in RHOAI complete?

dystewart commented 2 weeks ago

Testing in RHOAI is not yet complete, we still need to test the MIG settings with accelerator profiles , that is happening later this week

joachimweyl commented 1 week ago

@dystewart what are the next steps for testing?

joachimweyl commented 11 hours ago

@dystewart please add to the description what still needs to be done because based on the issues we said needed to be done we can move this but I believe you still have testing you wish to complete. Is it blocked by this?