payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
19 stars 26 forks source link

Payu doesn't find updated filepaths #474

Open blimlim opened 1 month ago

blimlim commented 1 month ago

I've been testing out ESM1.5 configurations with new input filepaths however run into a payu: error: through the following steps:

  1. Clone the pre-industrial configuration from https://github.com/ACCESS-NRI/access-esm1.5-configs/tree/pre-industrial using payu clone.
  2. Edit the config.yaml file, adding in the updated filepaths (which lie in tm70).
  3. Run the simulation with payu run.

This leads to the following error in the PBS error logs:

payu: error: Input directory /g/data/tm70/sw6175/esm1p5-input-restructure/restructured-inputs/modern/pre-industrial/atmosphere/aerosol/global.N96/2020.05.19/OCFF_1850_ESM1.anc not found; aborting.

which looks like it occurs as tm70 isn't included in the storage flags:

qsub -q normal -P tm70 -l walltime=7200 -l ncpus=384 ... -l storage=gdata/access+gdata/vk83

First running payu setup then payu run -f lets the job run without any issues, and it includes gdata/tm70 in the storage flags.

I'm not 100% sure on this, but I think payu is finding the storage paths for the restarts, executables, and input files from the manifests here: https://github.com/payu-org/payu/blob/c83489e1cf5af27f94aa8d255330cc50b6cb34fe/payu/schedulers/pbs.py#L119

And so if the payu setup step is skipped, the manifests aren't updated and I don't think the new file paths aren't found.

I just wanted to check whether this is what is meant to happen, ie is payu setup meant to be run whenever changes to the filepaths are made?

jo-basevi commented 1 month ago

Yeah I think you are right, that payu is finding the storage paths for restarts, exes and inputs using the manifests, so maybe payu setup is required when updating those paths. @aidanheerdegen any thoughts on this?

My only idea at the moment at how to get around that is changing payu to auto-run setup and sweep before it submits the pbs payu-run job. It would take longer to run and have a bunch of logs..

aidanheerdegen commented 1 month ago

Yeah I think you are right, that payu is finding the storage paths for restarts, exes and inputs using the manifests, so maybe payu setup is required when updating those paths. @aidanheerdegen any thoughts on this?

You're spot on. It's a known issue with the storage flags on gadi. It's a chicken and egg problem, which is why I always suggest users do payu setup every time they clone an experiment, or make changes like this. It's a great test to make sure you have access to all the paths in the config.yaml, and also updates manifests etc when paths change, which allows payu to choose the correct storage flags.

We've got it in our trouble-shooting section for OM2

https://access-hive.org.au/models/run-a-model/run-access-om/#trouble-shooting

I think I had it as a recommended step after cloning the experiment, but was convinced that it was better as a trouble-shooting measure.