Closed schwesig closed 4 days ago
@schwesig is this still on hold?
The team has confirmed they are able to access the gpus through RHOAI after upping their allocation in Coldfront
@dystewart is this still on hold?
and just to confirm this is 4 different nodes that were allocated for https://github.com/nerc-project/operations/issues/595 correct?
and just to confirm this is 4 different nodes that were allocated for #595 correct?
yes, (unfortunately) these are for another project. I thought we could use them.
BUT: test cluster not needed anymore, we got the team set up on the prod cluster for first tests, until we got their own dedicated cluster
Closing this now.
Update:
not needed anymore. interim testing is running on prod cluster
on hold until further note
no to do yet
Please allocate 4 GPU nodes for the test cluster required for the NERC Project 408: KruizeOptimization. This allocation is essential for the ongoing GPU optimization tests. The project aims to:
Conduct AI optimizations using OpenShift AI software. Enable testing with both MIG-enabled and non-MIG GPUs. Experiment with different configurations to optimize GPU utilization and timeslicing.
This is an interim solution for the project on the test cluster, until their own (prod) cluster is ready to use.
Needed by: https://github.com/nerc-project/operations/issues/580
CC @larsks @dystewart @hpdempsey