radical-cybertools / radical.saga

A Light-Weight Access Layer for Distributed Computing Infrastructure and Reference Implementation of the SAGA Python Language Bindings.
http://radical-cybertools.github.io/saga-python/
Other
83 stars 34 forks source link

On Stampede2 this routine returns 272 #738

Closed virthead closed 4 years ago

virthead commented 4 years ago

https://github.com/radical-cybertools/radical.saga/blob/74f5846cabdc2da39fb51eae3893251e43d34093/src/radical/saga/adaptors/slurm/slurm_job.py#L386

2019-09-26 14:14:28,731: radical.saga.cpi    : MainProcess                     : MainThread     : DEBUG   : run_sync: scontrol show nodes | grep CPUTot| sed -e 
's/.*\(CPUTot=[0-9]*\).*/\1/g'| sort | uniq -c | cut -f 2 -d = | xargs echo
2019-09-26 14:14:28,731: radical.saga.cpi    : MainProcess                     : MainThread     : DEBUG   : write: [   10] [  118] (scontrol show nodes | grep C
PUTot| sed -e 's/.*\(CPUTot=[0-9]*\).*/\1/g'| sort | uniq -c | cut -f 2 -d = | xargs echo\n)
2019-09-26 14:14:29,208: radical.saga.cpi    : MainProcess                     : MainThread     : DEBUG   : read : [   10] [    8] (272 96\n)
2019-09-26 14:14:29,208: radical.saga.cpi    : MainProcess                     : MainThread     : DEBUG   : read : [   10] [   10] (PROMPT-0->)
2019-09-26 14:14:29,208: radical.saga.cpi    : MainProcess                     : MainThread     : INFO    :  === ppn: 272

Because there are KNL nodes on the top of the output. But even for the others, Slurm expects 48 instead of 96.

virthead commented 4 years ago

I think this thing would look better in the configuration section, i.e. ppn = 48, 96, etc.

andre-merzky commented 4 years ago

see also #733

andre-merzky commented 4 years ago

duplpicate #733