payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
21 stars 27 forks source link

OpenMP support #66

Open aidanheerdegen opened 7 years ago

aidanheerdegen commented 7 years ago

There was a request for qgcm support in payu. This is an OpenMP model which is currently not supported.

I bodged up something, which is viewable here:

https://github.com/aidanheerdegen/payu/tree/qgcm

but it does the OpenMP support in a sub-optimal way. The call to mpirun is left as-is, so the number of cpus has to be 1. I've added an mpthreads option, the value of which is used to set the OMP_NUM_THREADS environment variable.

To get the correct number of CPUs in the PBS request I've used the ncpureq option in the config.yaml to force the number of cpus to be the same as the number of threads, like so:

ncpus: 1
ncpureq: 4
mpthreads: 4

This is sub-optimal (and a poor information model) but it was done to get it running as I thought there were some subtle issues to be addressed in run_cmd.py and experiment.py, particularly with respect to coupled multi models.

marshallward commented 7 years ago

Zhi has also suggested that we start doing MOM6 comparison tests with OpenMP enabled, so I second this idea, and suggest we make it a more general setting.

We are just talking about setting OMP_NUM_THREADS, right?

aidanheerdegen commented 7 years ago

Mostly. There is still a call to mpirun, which I guess needs to be kept, to support mixed OpenMP/MPI. In a coupled model there will need to be a number of CPUs and a number of threads, so they need to be multiplied together. I don't know if there are additional binding issues that need to be sorted -- the number of threads cannot exceed the number of available cores in a node for one. As it turns out we also had to pass --bind-none to the mpirun command, as by default it binds each instance to a single node.

It needs some thought.

marshallward commented 7 years ago

Qgcm does not use MPI right? Should we shall make the exec argument optional? Should remove the need to set --bind-none.