uabrc / devops-docs

https://docs.rc.uab.edu/devops-docs/
Apache License 2.0
1 stars 5 forks source link

Slurm reservations #36

Open wwarriner opened 8 months ago

wwarriner commented 8 months ago

Creating a reservation

You should be able to create a reservation, I believe Ops, Dev and DataSci have this authority, example 30 day res for c0220-c0223 for 3 users staring now

scontrol create reservation Reservation=$resv-name starttime=now duration=30-00:00:00 Nodes=c\[0220-0223] User=$user-a

Using a reservation (for researchers)

And I also noticed we need to use --reservation=$resv-name to make use of it with sbatch and srun.

Updating a reservation

scontrol update reservations Reservation=$resv-name User+=$user-b

Adding nodes to a reservation

If you do add new nodes, you’ll have to delete the reservation and recreate. To avoid jobs jumping on the nodes in the short time between delete and create, you should first drain all of the reservation nodes (make sure to update the node list in all of the commands)

for node in c0{232..235}; do scontrol_admin update NodeName="$node" State=drain Reason="RCOPS: Creating $resv-name reservation"; done

scontrol delete reservationname=$resv-name
scontrol create reservation Reservation=$resv-name starttime=now duration=35-00:00:00 Nodes=c\[0220-0223] User=$user-a,$user-b

for node in c0{232..235}; do scontrol_admin update NodeName="$node" State=undrain Reason="RCOPS: Created $resv-name reservation"; done

Canceling/ending reservation

$ scontrol show res
ReservationName=$resv-name StartTime=2023-10-19T10:14:33 EndTime=2023-11-18T09:14:33 Duration=30-00:00:00
   Nodes=c[0232-0235] NodeCnt=4 CoreCnt=512 Features=(null) PartitionName=(null) Flags=SPEC_NODES
   TRES=cpu=512
   Users=$user-a,$user-b Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a

$ scontrol delete reservationname=$resv-name