payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
18 stars 25 forks source link

Payu automatically increases cpus, then breaks because the number of cpus is wrong for expressbw #334

Closed ashjbarnes closed 1 year ago

ashjbarnes commented 1 year ago

To reproduce, run a mom6 job with layout 10,14 and request 140 cpus on expressbw. Payu increases cpus to 144, then breaks because you need to request multiples of 28

Output:

payu: warning: Job request includes 4 unused CPUs. payu: warning: CPU request increased from 140 to 144 Loading input manifest: manifests/input.yaml Loading restart manifest: manifests/restart.yaml Loading exe manifest: manifests/exe.yaml payu: Found modules in /opt/Modules/v4.3.0 qsub -q expressbw -P v45 -l walltime=10800 -l ncpus=144 -l mem=576GB -N flat_strength_2 -l wd -j n -v PAYU_PATH=/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.10/bin,MODULESHOME=/opt/Modules/v4.3.0,MODULES_CMD=/opt/Modules/v4.3.0/libexec/modulecmd.tcl,MODULEPATH=/g/data/hh5/public/modules:/etc/scl/modulefiles:/opt/Modules/modulefiles:/opt/Modules/v4.3.0/modulefiles:/apps/Modules/modulefiles -W umask=027 -l storage=gdata/hh5+gdata/v45+scratch/v45 -- /g/data/hh5/public/apps/miniconda3/envs/analysis3-22.10/bin/python3.9 /g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.10/bin/payu-run qsub: Error: You have requested 144 CPUs which is not a multiple of the number of CPUs in compute nodes in queue expressbw (28 CPUs per node). When requesting more than a single node (28 CPUs) in this queue you must request CPUs in multiples of full nodes.

angus-g commented 1 year ago

I think running on the Broadwell nodes you'll need to put

platform:
  nodesize: 28

into your config.yaml.

angus-g commented 1 year ago

In the code this is documented as a "todo" task to maybe make this part of the scheduler or a server driver. From what I can tell (but I'm almost definitely missing something), it's quite hard to figure out what the per-node CPU count is for the different queues.

ashjbarnes commented 1 year ago

Ah thanks Angus that makes sense!

There's another strange case that payu automates unhelpfully:

you're forced to choose exactly the number of cpus required by the mask table, but then payu automatically increases it afterwards. Just seems a bit silly. Does payu just automatically try to round to the nearest multiple of node cpus even if it has extra ones sit idle?

angus-g commented 1 year ago

Yes, that's just a limitation of PBS: once you're requesting more than one node, you can only request multiples of entire nodes. The error message you got above wasn't payu breaking directly, it was an error thrown by qsub. Taking Broadwell as an example, the allowed CPU counts at the PBS level are 1, 2, ..., 27, 28, 56, 84, ..., but the model could use any CPU count that you end up with according to your layout and masking.

I'd argue that it's helpful (indeed necessary?) of payu, since you need to request more CPUs than your model/mask table requires in most cases.

ashjbarnes commented 1 year ago

That makes a lot of sense - I've not been paying attention to which process is throwing the error clearly. So for my original case payu just assumed that I needed multiples of 48 where I actually needed to specify 28- that's the only limitation but otherwise it's doing the right thing.

marshallward commented 1 year ago

It was originally written when the machine was homogeneous and cores-per-node was always fixed. So at the time, it did solve a common frustration.

Unfortunately the only way I could find to check the cores-per-node is to either ping the scheduler, which is as slow as submitting a new job, or to somehow track it in a global config file installed with the application (perhaps a platform.yaml file, either in /etc, the python install directory, or the user's directory.) If I was still free to work on this, then I would have pursued the second option.