nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
1 stars 0 forks source link

Get details about requirements for RHEL AI machines #594

Closed larsks closed 2 weeks ago

larsks commented 1 month ago

Contact Jeremy (Eder) re: request for bare metal GPU nodes for RHEL AI testing.

Our plan is to manage these machines using ESI, but that has implications for things like console access and other aspects of bare metal management.

larsks commented 1 month ago

I've emailed Jeremy some initial questions about how they will be using this hardware.

hpdempsey commented 1 month ago

Jeremy replied that his goals are 1) RHEL / bootc / installer / drivers / CUDA - functional testing. 2) everything above that -- using the GPUs for synthetic data generation, actual training. He does not need BIOS config changes.

joachimweyl commented 2 weeks ago

machines in use