Open blimlim opened 1 month ago
An option could be to run the archive
user-script as a postscript
. Postscripts are always resubmitted via qsub
and are run after collation or archive, if collation is not enabled. E.g. in config.yaml
:
postscript: my_script.sh
The command is run in payu here: https://github.com/payu-org/payu/blob/e9bd1f4c4adc223e818c71f8d45c36de98025dc4/payu/experiment.py#L834-L837
If postscript
is enabled, syncing latest output is disabled as payu has no idea when the postscript finishes, so only previous outputs are synced (e.g. outputsN
where N < current_run
).
Passing specific environment vars such as current output dir could be done, using -v
to export the environment variable to the pbs job, e.g.
postscript: -v PAYU_CURRENT_OUTPUT_DIR my_script.sh
So the sync job has a userscript
option which runs before any rsync
commands are run. It means that nothing will be synced if the userscript fails to run. The sync job by default runs on copyq
but queue and resources can be configured, see payu postprocessing documentation. However this may not work for this esm1.5 script as it might need to run every time- not just when sync is configured?
Using PBS Job Dependency with -W depend=...
in the sync call could be a improvement, as it will allow latest outputs to be synced. This will require getting the Run ID and then passing that to the sync submit code.
Getting the run_id
from a postscript call would be straight forward as that should be the only value returned. On the other hand, finding a run_id
in a userscript output might be more difficult unless it was assumed the last value in the output, and would need to check if it's a valid run_id
.
Passing it the sync call would require adding an argument to the payu sync
command. Then could add the -W depend=...
to the pbs_flags
in pbs_config
: https://github.com/payu-org/payu/blob/e9bd1f4c4adc223e818c71f8d45c36de98025dc4/payu/subcommands/sync_cmd.py#L61
Or add it somewhere in pbs_config
, and then build it the -W depend=
in the PBS-specific submit():
https://github.com/payu-org/payu/blob/e9bd1f4c4adc223e818c71f8d45c36de98025dc4/payu/schedulers/pbs.py#L26
A thing I am not too sure on is what dependency type to use. afterok:<run_id>
will only run after job has terminated with no errors while afterany:<run_id>
will run after job has terminated.
I think a big issue could be if a dependency is not met, the job stays in the queue indefinitely. So using afterok:<run_id>
where <run_id>
is a job that exits with an error, then this dependency condition will never be met. So I think dependency types might have to be after
/before
rather than afterok
/beforeok
should be used to prevent jobs being held indefinitely. I can't see a good solution of preventing sync
job if prerequisite pbs job exits with an error.
There is also caveats in NCI docs for Job Dependencies. Say if job2 was submitted with after_ok:<job1>
and if at the time of submission, job1 had already finished successfully. Then the job scheduler has no knowledge of job1, so job2 would be held. The docs suggest to submit job2 first with depend=on:1
then submit job1 with depend=beforeok:<job2>
. My initial thoughts are that I don't want to submit a sync job first with a depend=on
, and then there's some errors before other jobs are even submitted and then the sync job is stuck on the queue. There's also some suggestions in this openpbs forum post to hold job1 then submit job2 with afterok
, then release job1.
I think it'll be more straightforward to implement dependency logic with postscript
rather userscripts
. As postscript
is submitted in code just before a sync
job is submitted, it's probably unlikely that the postscript would complete before sync
is submitted. Userscripts would need logic to check if a qsub call was even run, then extract the run_ID, and by time sync is called which could be in a collate
job, would have to check if any userscript qsub run ids are still queued/running, and if so add a depend=
to sync submissions.
Hope this makes sense.. I have only really learnt about configuring PBS dependency today so I could fully be wrong..
Thanks for writing these explanations up, these look like great ideas.
An option could be to run the
archive
user-script as apostscript
I think this sounds like a good option, especially if it makes it viable down the line to add in dependency logic.
(sync userscript) However this may not work for this esm1.5 script as it might need to run every time- not just when sync is configured?
We'll want the ESM1.5 netcdf conversion to run both when the syncing is and isn't enabled, and so I agree that this option wouldn't work in every case.
I think a big issue could be if a dependency is not met, the job stays in the queue indefinitely. So using
afterok:<run_id>
where<run_id>
is a job that exits with an error, then this dependency condition will never be met. So I think dependency types might have to beafter
/before
rather thanafterok
/beforeok
should be used to prevent jobs being held indefinitely. I can't see a good solution of preventingsync
job if prerequisite pbs job exits with an error.
I found the online documentation about the different dependency types a bit unclear about how they actually behave. I ran a quick test on gadi for the afterok
condition, setting off a job which would fail after waiting 200 seconds:
$ qsub fail_job.sh
$ 121578665.gadi-pbs
fail_job.sh
----------------------
#PBS -l ncpus=1
...
sleep 200s
exit 1
and then a dependent job which just prints out a message:
$ qsub -W depend=afterok:121578665 dependent-job.sh
The dependent job was held in the queue
gadi-pbs:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
121578665.gadi-pbs sw6175 normal-* fail-job.* 24520* 1 1 1024m 00:05 R 00:00
121578679.gadi-pbs sw6175 normal-* dependent* -- 1 1 1024m 00:05 H --
but looks like it got deleted once the first job failed
$ qstat -u sw6175
$
and so I'm wondering if this condition could work for holding the sync jobs? We'd probably need some way of reporting to the user if the sync job didn't run though, but I'm not sure of any good ways to do that.
I think it'll be more straightforward to implement dependency logic with
postscript
ratheruserscripts
. Aspostscript
is submitted in code just before async
job is submitted, it's probably unlikely that the postscript would complete beforesync
is submitted. Userscripts would need logic to check if a qsub call was even run, then extract the run_ID, and by time sync is called which could be in acollate
job, would have to check if any userscript qsub run ids are still queued/running, and if so add adepend=
to sync submissions.
This makes a lot of sense. The postscript option sounds a lot easier and so I'll test out running the conversion as a postscript.
I've just tried running the conversion as a postscript. It looks like it works well when the storage
and project
settings are hardcoded into the job submission script, and the set_userscript_env_vars()
is called first in the postprocess
step. We'll want to use the project
and storage
values from the simulation for the conversion job though, which I think might require the shell to be allowed when submitting the postscript.
E.g. adding the needs_subprocess_shell
function to the postprocess
step:
def postprocess(self):
"""Submit any postprocessing scripts or remote syncing if enabled"""
self.set_userscript_env_vars()
# First submit postprocessing script
if self.postscript:
envmod.setup()
envmod.module('load', 'pbs')
cmd = 'qsub {script}'.format(script=self.postscript)
if needs_subprocess_shell(cmd):
sp.check_call(cmd, shell=True)
else:
sp.check_call(shlex.split(cmd))
...
Lets the following run ok:
postscript: -v PAYU_CURRENT_OUTPUT_DIR -P ${PROJECT} -lstorage=${PBS_NCI_STORAGE}+gdata/access+gdata/hh5 ./scripts/NetCDF-conversion/UM_conversion_job.sh
Would these changes be ok to add? There could be a better way of passing the project
and storage
values to the postscript which I might have missed though.
so I'm wondering if this (
afterok
) condition could work for holding the sync jobs? We'd probably need some way of reporting to the user if the sync job didn't run though, but I'm not sure of any good ways to do that.
Oh that's awesome! There are log files that are created if sync
is run, but that requires the user to check those sync logs exist.. Sync could still use after
and if postscript was configured, maybe read the postscript logs for an exit status and abort with some error message. I guess it still has a similar issue that the only way you know a sync failed is by reading the log files. So in a way, not having sync log files with afterok
is actually more visible that something went wrong? Maybe there should a payu command that parses log files from the various run/collate/postprocess/sync jobs and displays a graph with their exit status values..
Would these changes be ok to add? There could be a better way of passing the project and storage values to the postscript which I might have missed though.
Yay that postscript almost works! I reckon those changes will be good to add
Great discussion of the issues!
It's good postscript
works, but I'm not sold on having this in a released config
postscript: -v PAYU_CURRENT_OUTPUT_DIR -P ${PROJECT} -lstorage=${PBS_NCI_STORAGE}+gdata/access+gdata/hh5 ./scripts/NetCDF-conversion/UM_conversion_job.sh
That is some code spaghetti.
An option that has not been suggested so far is to add a collate
method for the UM
driver that does the um2nc
conversion. It's not collation in the true sense, but it is a post-processing operation that needs to be done at a very similar time in the model life-cycle.
We should definitely discuss before trying to implement it, but thought I'd chuck it in as a possibility.
That is a good point. Another similar option to having it run in the collate job, is to add a collate
stage userscript to payu?
After discussing we think it can be reduced to this
postscript: -V -lstorage=${PBS_NCI_STORAGE} ./scripts/NetCDF-conversion/UM_conversion_job.sh
Maybe payu
could also have an option to automatically add -V -lstorage=${PBS_NCI_STORAGE}
as that seems like a pretty common and useful thing: to access all the same projects as the calling job, and have access to the conda
environment payu
itself is running in.
I'm actually starting to think we should turn the delete option on by default because most users wouldn't really have the ability to do detailed checks, and then the instructions to turn it off are clearer because it is commenting out an option that is already there.
Otherwise we risk users naively running and filling the disk with a large amount of useless data.
If we didn't delete them would the um fields fields be sync'ed if syncing was turned on? I think they probably would because we're not specifying any wildcards to ignore (@jo-basevi can advise on how to do this).
Maybe payu could also have an option to automatically add -V -lstorage=${PBS_NCI_STORAGE} as that seems like a pretty common and useful thing: to access all the same projects as the calling job, and have access to the conda environment payu itself is running in.
Sorry I think I made a mistake yesterday, it would not have the access to the conda environment. During the run PBS job, payu is run using the path to the python executable and path to where payu is installed -
qsub .... /g/data/vk83/prerelease/apps/payu/dev/bin/python /g/data/vk83/prerelease/apps/payu/dev/bin/payu-run
As it never actually loads the payu environment on the payu-run
call, the environment variables would not be set..
We could add access to the python executable to environment of the postscript call - similar to this PR - https://github.com/payu-org/payu/pull/491? In a way it could be useful to load the payu module in the PBS payu-run job to set the environment variables but I am unsure of a clean way to do it without specifying a payu modulepath and filename in config.yaml
.
Some related updates on using the -V
flag with qsub
. It doesn't seem to actually pass forward the whole environment, especially with PATH
variables. I tried loading in payu on the login node, running a job with the -V
flag, and comparing the two environments:
Outside of PBS job:
PATH=/g/data/vk83/prerelease/apps/payu/dev/bin:/home/565/sw6175/perl5/bin:/home/565/sw6175/.local/bin:/home/565/sw6175/bin:/opt/pbs/default/bin:/opt/nci/bin:/opt/bin:/opt/Modules/v4.3.0/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/pbs/default/bin
MODULEPATH=/g/data/vk83/prerelease/modules:/etc/scl/modulefiles:/apps/Modules/restricted-modulefiles/matlab_monash:/opt/Modules/modulefiles:/opt/Modules/v4.3.0/modulefiles:/apps/Modules/modulefiles
Inside the PBS job:
PATH=/home/565/sw6175/perl5/bin:/opt/pbs/default/bin:/opt/nci/bin:/opt/bin:/opt/Modules/v4.3.0/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/local/pbs/bin
MODULEPATH=/etc/scl/modulefiles:/apps/Modules/restricted-modulefiles/matlab_monash:/opt/Modules/modulefiles:/opt/Modules/v4.3.0/modulefiles:/apps/Modules/modulefiles
When the -V
option is included, it seems to interfere with loading the payu module in the PBS job. I tested two job scripts, one which uses/loads the payu module, and one which doesn't, and tested them both with and without the -V
flag:
load_payu.sh
---------------
#!/bin/bash
module list
module use /g/data/vk83/prerelease/modules
module load payu
which payu
echo "myvar: $myvar"
esm1p5_convert_nc -h
noload_payu.sh
----------------
#!/bin/bash
module list
which payu
echo "myvar: $myvar"
esm1p5_convert_nc -h
STDERR files in each case:
qsub -V load_payu.sh
:
Currently Loaded Modulefiles:
1) pbs 2) payu/dev
/usr/bin/which: no payu in (/home/565/sw6175/perl5/bin:/opt/Modules/v4.3.0/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/local/pbs/bin)
/local/spool/pbs/mom_priv/jobs/123535948.gadi-pbs.SC: line 14: esm1p5_convert_nc: command not found
qsub -V noload_payu.sh
:
Currently Loaded Modulefiles:
1) pbs 2) payu/dev
/usr/bin/which: no payu in (/home/565/sw6175/perl5/bin:/opt/Modules/v4.3.0/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/local/pbs/bin)
/local/spool/pbs/mom_priv/jobs/123535949.gadi-pbs.SC: line 12: esm1p5_convert_nc: command not found
The only one that worked was to omit the -V
flag, and to use/load payu within the job script, which had:
qsub load_payu.sh
STDOUT
:
/g/data/vk83/prerelease/apps/payu/dev/bin/payu
myvar:
usage: esm1p5_convert_nc [-h] [--quiet] [--delete-ff] current_output_dir
positional arguments:
current_output_dir ESM1.5 output directory to be converted
...
I thought one way to simplify the script call would be to move the PAYU_CURRENT_OUTPUT_DIR
to a command line argument for the job script, and adding in a second argument with the --delete-ff
flag. I.e. in config.yaml
:
postscript: -lstorage=${PBS_NCI_STORAGE} ./scripts/NetCDF-conversion/UM_conversion_job.sh $PAYU_CURRENT_OUTPUT_DIR # -delete-ff
and then in scripts/NetCDF-conversion/UM_conversion_job.sh
:
module use /g/data/vk83/prerelease/modules
module load payu
esm1p5_convert_nc "$@"
which would have the benefit of controlling the delete-ff
flag directly in the config.yaml
.
This gives a qsub usage error.
usage: qsub ...
[-v variable_list] [-V ] [-z] [script | -- command [arg1 ...]]
qsub --version
Not sure if I'm interpreting this correctly, but does it look like command line arguments are only allowed when submitting a command instead of a script?
If we had the payu module loaded into the payu-run job, do you think it would be possible to omit the scripts/NetCDF/UM_conversion_job.sh
script from the configuration completely? And instead call the conversion via
postscript: -lstorage=${PBS_NCI_STORAGE} -- esm1p5_convertnc $PAYU_CURRENT_OUTPUT_DIR # -delete-ff
With the above, we might have to add the walltime
and memory
settings and so I'm not sure what's simplest!
I might be incorrect about quite a bit of this though!
Some related updates on using the -V flag with qsub. It doesn't seem to actually pass forward the whole environment, especially with PATH variables. I tried loading in payu on the login node, running a job with the -V flag
That's really interesting, maybe module loading payu as part of payu-run may not be the best idea..
If we had the payu module loaded into the payu-run job, do you think it would be possible to omit the scripts/NetCDF/UM_conversion_job.sh script from the configuration completely?
I guess one way to test it would be to add payu to modules loaded in config.yaml
, e.g.
modules:
use:
- /g/data/vk83/prerelease/modules
load:
- payu/dev
Though I don't think this is a great idea having to specify a payu-version.
Another way to avoid loading the payu module in the postscript, could be to pass -v $PAYU_PATH
to script (which should be /g/data/vk83/prerelease/apps/payu/dev/bin
for payu/dev). So to run the script, use:
$PAYU_PATH/python $PAYU_PATH/esm1p5_convert_nc
This PR also kinda solves the same problem. @blimlim is kinda manually testing this. If it works we should merge and test that.
The NetCDF conversion script for ESM1.5 will be run as an
archive
stage userscript, where it is submitted to the queue as a PBS job to avoid holding up the simulation. If syncing is enabled inconfig.yaml
, there's no guarantee that the conversion job will be finished by the time the syncing job is run. E.g. testing several month long simulations with syncing enabled, some month's NetCDF files were not all copied over to the syncing directory:From what I can understand, the collation step gets around a similar issue by calling the sync step from within the collation job:
https://github.com/payu-org/payu/blob/2391485ac4af756f45ae6d4720cd263b30940ae5/payu/subcommands/collate_cmd.py#L110-L112
and but a similar approach might not be possible with the external conversion userscript. Just flagging this now so that we can discuss ideas to handle this.