nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
2 stars 0 forks source link

Move one of the unused V100s from OpenStack to OpenShift Test Cluster #667

Closed joachimweyl closed 2 months ago

joachimweyl commented 4 months ago

Motivation

need to test GPU usage for multiple reasons. Currently, there is a specific test we want to do to prevent workloads from scheduling on GPU nodes.

Completion Criteria

V100 node was removed from the OpenStack network to the ESI network and finally from the ESI network to the OpenShift Test network. Then add it to the OpenShift test cluster.

Description

Completion dates

Desired - 2024-08-09 Required - 2024-09-11

joachimweyl commented 4 months ago

As long as we need a Maintenance window for this do we want to move more of the V100s out of OpenStack they are (they fluctuate between 4% and 20% usage) Shall we pull 6 of the 8 2 GPU nodes? 1 could go to test and the other 5 could go to OpenShift Prod.

joachimweyl commented 2 months ago

@jtriley can you give an update on the status of this issue?