Open aidanheerdegen opened 3 months ago
If backwards compatibility was required, or if it was clearer for users, there could be a new config option
runtime
which is then used to calculatewalltime
ifwalltime
isn't specified.
If we did this it might make sense to have runtime
and walltime
mutually exclusive. So use either one or the other, and by default with runspersub: 1
they would have identical practical outcomes.
Calculating the final walltime for users still requires users to be aware of the maximum walltime of the queue they're using. If they modify the model config so it takes longer for a single run they would need to change runtime
and runspersub
otherwise they may exceed the maximum walltime of the queue. Which would waste a lot of resources.
If maxwalltime
was defined in the platform
config, and set to the known defaults in payu
then it would just require changing runtime
and payu
could check the current settings were consistent.
Currently the
runspersub
option requires the user to make compensating modifications to thewalltime
requested to ensure the multiple number of runs can complete within a single PBS submit.This has been a source of confusion in the past.
ACCESS-NRI is working up configurations for ACCESS-ESM1.5 and this model has a maximum run-time of 1 year. However it is a low-res ESM model that typically requires very long runs to equilibrate slow carbon cycling. It would be convenient to have
runspersub: 20
to minimise PBS queue time and a proliferation of PBS logs.However this would mean the default configuration would have a PBS walltime of 48hrs. For users doing short test runs this would impact their movement through the queue.
The proposal is to alter the logic so that
walltime
is set by the user to reflect how long it takes for a single run of the model. Thenrunspersub
and the number of runs requested could be used to modify the requested walltime to make sure the job can complete (basicallysubmit_walltime
=min(runs, runspersub)
*walltime
).This has a nice feature that
runspersub
can be left set to a larger number, and however many runs a user selects the submitted wall time would be adjusted up to a maximum value that isrunspersub
*walltime
.Clearly this would require useful informative messages to the user to let them know how the PBS submission was being altered.
There is some precedence here with the way
payu
pads CPU requests to be a multiple of nodes, or sets memory limits if no memory is set.If backwards compatibility was required, or if it was clearer for users, there could be a new config option
runtime
which is then used to calculatewalltime
ifwalltime
isn't specified.