Closed jo-basevi closed 8 months ago
Closes #200
Hello @jo-basevi! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Extra Notes/Questions
Currently if post-script is configured AND automatic syncing is enabled, the latest output (and lastest restarts, if syncing restarts is enabled) will not be synced. To manually sync the latest outputs after the post-script job has completed, run:
payu sync
Instead of post-script, to automatically ensure post-processing happens before syncing, there's a sync userscript
hook. This will run at the start of the sync pbs job and before running any rsync commands. There are options to modify pbs resources/queue under the sync
namespace in config.yaml.
If post-processing needs to run on a different queue to sync job (if need network access for syncing to a remote machine and post-processing >1 NCPUs), or it requires significant resources, an option could be to call payu sync
at the end of a post-script and disable sync being called automatically by payu.
If the above won't work, an option could be to add to payu a post-processing command to submit a post-processing job which runs any shell/python post-processing userscripts and then either runs payu.sync()
after completion or submits a separate copyq job to run payu.sync()
.
There is a config option to remove local files once synced. As this leaves behind empty directories and files excluded from rsync, there is another config option to remove local restart/output directories. This also happens after the directory has been successfully synced.
The latest output (and last "permanently-saved" restart and subsequent restarts) are protected from both local delete options. Any additional configured paths to sync in config.yaml
are also protected. The last "permanently-saved" restart is protected because it is used as a checkpoint for date-based restart pruning.
A procedure to remove all local files and directories - including last outputs/restarts could be:
payu sync
to sync all outputs, or payu sync --sync-restarts
to sync all restarts, as well.payu sweep --hard
to delete everything in the local archive. #### Default path for the remove archive?.
~~If not syncing to a remote machine, I've automated the destination path for remote archive to be
/g/data/${PROJECT}/${USER}/${MODEL}/${EXPERIMENT_NAME}/archive
.~~
I'm unsure if this is the best idea as it could risk outputs being overwritten if experiment name is not unique.. I am thinking it might be best to remove this, until experiments have a unique ID. Edit: This has been removed
I've kept most of the logic in pre-exisiting remote_archive() for rsyncing to a remote machine, which seems to be in-line with nci's guide for setting up restricted rsync calls with ssh-keys.
I've attempted to go through the above guide above and rsync small files from gadi to my local desktop but haven't had any success. So I've left the remote sync options (e.g rsync_protocol
, user
and url
) out of the documentation for now, as I've haven't been able to test it.
Currently, --exclude *.nc.*
is always added to the rsync commands to ignore uncollated files.
Would syncing uncollated files be an option? If so, I could add a ignore_uncollated
flag to config.yaml
, with True being the default value.
Just checking - the current sync script rsyncs all outputs (not just the latest) every time it's called, to handle asynchronous collation. Does payu sync
have the same behaviour?
Just checking - the current sync script rsyncs all outputs (not just the latest) every time it's called, to handle asynchronous collation. Does payu sync have the same behaviour?
Yes, it rsyncs all the outputs directories every time it's called. I had looked into rsyncing only the current output or current/ prior restart- if collation is enabled, however realised there wasn't much point as rsync would be able to quickly pass over already synced directories in archive anyway..
Thanks @aidanheerdegen for the review! I've just added in the requested changes. I've tested it locally but will have to wait until gadi is back up and running to run through a couple of actual model output tests.
I've tested it all works on gadi, I just added on line in the tests to disable attempting to sync runlog. I also tidied up the commit history
Add a new payu command to sync output to a remote directory. I've extended the expt.postprocess(), which either runs after collation or end of archive if not collating, to also call 'payu sync' if syncing is enabled.
An example config currently is
The cosima's sync_data.sh (https://github.com/COSIMA/1deg_jra55_iaf/blob/master/sync_data.sh) does some extra processing, e.g: runs a run_summary python script and concatenates ice daily files.
An example of postprocess_archive.sh could look like for 1deg_jra55_iaf
```bash #!/bin/bash # Load/use required enviroment modules if [ -z "$MODULESHOME" ]; then echo "warning: No Environment Modules found; skipping $command call." exit 1 fi modulecmd="$MODULESHOME/bin/modulecmd bash" # Modules for ice concatenation eval "$($modulecmd load nco)" # Modules for run summary eval "$($modulecmd use /g/data/hh5/public/modules)" eval "$($modulecmd load conda/analysis3)" eval "$($modulecmd load python3-as-python)" # Concatenate ice daily files for d in archive/output*/ice/OUTPUT; do for f in $d/iceh.????-??-01.nc; do if [[ ! -f ${f/-01.nc/-IN-PROGRESS} ]] && [[ ! -f ${f/-01.nc/-daily.nc} ]]; then touch ${f/-01.nc/-IN-PROGRESS} echo "doing ncrcat -O -L 5 -7 ${f/-01.nc/-??.nc} ${f/-01.nc/-daily.nc}" ncrcat -O -L 5 -7 ${f/-01.nc/-??.nc} ${f/-01.nc/-daily.nc} && chmod g+r ${f/-01.nc/-daily.nc} && rm ${f/-01.nc/-IN-PROGRESS} if [[ ! -f ${f/-01.nc/-IN-PROGRESS} ]] && [[ -f ${f/-01.nc/-daily.nc} ]]; then for daily in ${f/-01.nc/-??.nc} do # mv $daily $daily-DELETE # rename individual daily files - user to delete rm $daily done else rm ${f/-01.nc/-IN-PROGRESS} fi fi done done sourcepath="$PWD" cd archive || exit 1 # Delete any cice log files that only have a 105-character header and nothing else find output* -size 105c -iname "ice.log.task_*" -delete # Update and commit run summary - do this last in case it doesn't work cd $sourcepath ./run_summary.py --no_header git add run_summary*.csv git commit -m "update run summary" echo "$0 completed successfully" ```Edited: 2/12/23