mercanca / spart

spart: a user-oriented partition info command for slurm
GNU General Public License v2.0
19 stars 4 forks source link

Support for features/constraint #9

Closed jgphpc closed 4 years ago

jgphpc commented 4 years ago

Hi,

This is more a RFE than an issue: would it be possible to support features ?

In our normal partition, we can choose between 2 constraints (gpu or mc) i.e users submit jobs with sbatch -Cgpu or sbatch -Cmc. This can be seen with sinfo:

SS -p normal |grep mc

normal*      up 1-00:00:00      1  maint nid00405 mc,startx,row1,c0-1,group1,row1,mon,cvmfs,vtune,perf,metricbeat
normal*      up 1-00:00:00      5   idle nid00[404,406-407,416-417] mc,startx,row1,c0-1,group1,row1,mon,cvmfs,vtune,perf,metricbeat
normal*      up 1-00:00:00      2   idle nid00[418-419] mc,startx,row1,c0-1,group1,row1,mon,vtune,perf,metricbeat

SS -p normal |grep gpu

normal*      up 1-00:00:00      1   resv nid00000 gpu,startx,c0-0,group0,row0,mon,gpumodedefault,cvmfs,vtune,perf,nvidia_driver_396.24,metricbeat
normal*      up 1-00:00:00     23   idle nid000[01-11,32-43] gpu,startx,c0-0,group0,row0,mon,gpumodedefault,cvmfs,vtune,perf,nvidia_driver_396.24,metricbeat

where SS='sinfo -o "%9P %.5a %.10l %.6D %.6t %N %f"

Would it be possible to have spart output this (i did not update numbers):

     QUEUE STA   FREE  TOTAL RESORC  OTHER   FREE  TOTAL   MIN    MAX DEFMEM MAXMEM    DEFAULT    MAXIMUM  CORES   NODE
 PARTITION TUS  CORES  CORES PENDNG PENDNG  NODES  NODES NODES  NODES GB/CPU GB/CPU   JOB-TIME   JOB-TIME  /NODE MEM-GB
    normal:mc   *   1328   1328      0      0     38     38     0      -      -      -     1 hour     1 days     16     32
    normal:gpu   *   1328   1328      0      0     38     38     0      -      -      -     1 hour     1 days     16     32

instead of:

     QUEUE STA   FREE  TOTAL RESORC  OTHER   FREE  TOTAL   MIN    MAX DEFMEM MAXMEM    DEFAULT    MAXIMUM  CORES   NODE
 PARTITION TUS  CORES  CORES PENDNG PENDNG  NODES  NODES NODES  NODES GB/CPU GB/CPU   JOB-TIME   JOB-TIME  /NODE MEM-GB
    normal   *   1328   1328      0      0     38     38     0      -      -      -     1 hour     1 days     16     32
mercanca commented 4 years ago

But, even at your cluster, there are a lot of the node features ( mc,startx,row1,c0-1,group1,row1,mon,cvmfs,vtune,perf,metricbeat ). If I change the output as you wish, each partition will be printed more than 10 times. You want only 2 of them, but how spart will know you want just 2 of them.

I never see a running cluster which uses the node features like your cluster, it's new to me. I gonna think about it. But I can not see a usefull way to show as you want.

jgphpc commented 4 years ago

Hi.

Thank you for your comment. You are right, our configuration is very site specific. I can only say that even if for slurm all features are similar, for the users the first feature on the list (mc or gpu) is mandatory to be able to submit jobs/choose the right type of compute nodes:

/etc/opt/slurm/job_submit.lua

if ( not job_desc.features or job_desc.features  == '' ) then
slurm.log_user("You have to specify, at least, what sort of node you need: 
-C gpu for GPU enabled nodes, or -C mc for multicore nodes.\n
Other features are possible, 
but 'gpu' and 'mc' are exclusive.")
return ESLURM_REQUESTED_NODE_CONFIG_UNAVAILABLE

Feel free to close this issue if you prefer.

mercanca commented 4 years ago

I known your request is different, but at least, I have added the features info to the spart (1.2.0) command.