pitt-crc / wrappers

User focused command line wrappers around Slurm
https://crc-pages.pitt.edu/wrappers/
GNU General Public License v3.0
1 stars 1 forks source link

Invalid default partition opa of mpi cluster in crc-interactive wrapper #157

Closed chnixi closed 1 year ago

chnixi commented 1 year ago

When running the crc-interactive wrapper to request mpi nodes, the default partition is set to opa when no partition is explicitly specified via the -p argument. It would be good to set this partition to mpi or another valid partition so users don't need to specify this explicitly every time.

crc-interactive.py -m -c 28 -n 2 srun: error: invalid partition specified: opa srun: error: Unable to allocate resources: Invalid partition name specified

djperrefort commented 1 year ago

This may be an issue with the Slurm configuration. The slurm command being used by the wrapper applications looks valid:

$ crc-interactive.py -m -c 28 -n 2 -z

srun -M mpi --export=ALL --time=1:00:00 --mem=1g --ntasks-per-node=28 --nodes=2 --pty bash

Running the slurm command gives the error you mentioned above. srun should select the default partition, but for some reason it is selecting the partition opa. This partition does not exist:

$ sinfo -a -M mpi
CLUSTER: mpi
PARTITION     AVAIL  TIMELIMIT  NODES  STATE NODELIST
opa-high-mem*    up   infinite     36  alloc opa-n[96-131]
mpi              up   infinite      1  down* mpi-n61
mpi              up   infinite      1  drain mpi-n129
mpi              up   infinite     22    mix mpi-n[6-7,30-35,46-47,54-55,86-87,90-91,107,113-114,130-132]
mpi              up   infinite     53  alloc mpi-n[28-29,48-53,56-60,62-85,93-106,115-116]
mpi              up   infinite     59   idle mpi-n[0-5,8-27,36-45,88-89,92,108-112,117-128,133-135]
scavenger        up   infinite     36  alloc opa-n[96-131]

@Comeani might have a better idea how the default partition is configured.

iamtroy412 commented 1 year ago

The previous default partition for the MPI cluster was opa but has now been retired. The new default partition is opa-high-mem and set correctly in the MPI head nodes Slurm configuration, but was still being referenced as the default partition in the job_submit.lua on the MPI head node. This has been updated to point to the new default partition opa-high-mem.

    -- If not partition, set to default
    if job_desc.partition == nil then
        job_desc.partition = "opa-high-mem"
    end