nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
1 stars 0 forks source link

Add wrk-99 to production #557

Closed joachimweyl closed 1 month ago

joachimweyl commented 2 months ago

Motivation

now that wrk-99 is working we would like to make sure it is usable by clients.

Completion Criteria

wrk-99 added to prod and ready to used

Description

Completion dates

Desired - 2024-05-08 Required - TBD

joachimweyl commented 1 month ago

@jtriley what are the next steps to ensuring this node is added to production?

joachimweyl commented 1 month ago

now that the update freeze is in place this is blocked until after the 9th.

joachimweyl commented 1 month ago

@jtriley Now that the freeze is no longer in place I am unblocking this issue.

jtriley commented 1 month ago

wrk-99 is now in Ready state on ocp-prod. All containers in nvidia-gpu-operator namespace running on the host are in Running or Completed state and I've verified that I can see all the requisite nvidia node labels and nvidia-smi is functioning inside it's nvidia-device-plugin-daemonset pod.