nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
2 stars 0 forks source link

Provision 1 bare metal Nvidia A100 GPU for the RHELAI Pen Testing project. #692

Closed hpdempsey closed 2 months ago

hpdempsey commented 3 months ago

Set up 1 A100 via ESI for the RHELAI team to use for pen testing. Make 30G of storage available for associated files from RHELAI. External access to the console will be required for users. This project will not conduct any testing that could cause DOS or other issues, per Jeremy Eder.

HPD has set up the project through ColdFront for tracking/billing purposes. Add usage for this project to Research group bill.

Stay in contact with the main developer Thibault Guittet ([tguittet@redhat.com) in order to decommission the GPU as soon as pen testing is done, due to daily usage charges. The project is not expected to last any longer than 2 weeks.

tzumainn commented 3 months ago

I've allocated the node and sent an email to Thibault regarding how to use it. ESI doesn't have external storage right now; however the node itself has more than enough local disk.

joachimweyl commented 2 months ago

@tzumainn to confirm is the node you provisioned MOC-R8PAC23U31? and if so what project did you add it to?

tzumainn commented 2 months ago

.@joachimweyl The node I leased is MOC-R8PAC23U31; it has not yet been provisioned. It's been added to project rhelai

tzumainn commented 2 months ago

The node has been provisioned (by me) and Thibault has confirmed he has access. I've sent instructions regarding the steps I've taken and how to undeploy/reprovision.

I'll put this under "In Review" until we're confident that the node is working as intended.

joachimweyl commented 2 months ago

@tzumainn can we reach out again to Thibault to confirm this is working?

tzumainn commented 2 months ago

Already done; I've checked with him twice this week. Also confirmed that their initial ISO plan for deployment is not a requirement, and that requesting a qcow2 image is fine (and it looks like GA has a qcow2 image available). I guess this can be closed, unless @hpdempsey you'd like to wait a bit more or check on something else?