payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
19 stars 26 forks source link

Payu runs input checksums on every run when submitting with -n N #526

Open Whyborn opened 2 days ago

Whyborn commented 2 days ago

Payu re-submissions in a -n N run job trigger re-generating the input manifest. For small jobs, this becomes a significant portion of run time (maybe this is only relevant for staged_cable jobs?). I don't think there's any reason to recompute the input manifest for subsequent runs.

aidanheerdegen commented 2 days ago

Payu re-submissions in a -n N run job trigger re-generating the input manifest.

payu only checks the binhash hasn't changed. This should be a fast check. How many input files are there?

For small jobs, this becomes a significant portion of run time (maybe this is only relevant for staged_cable jobs?). I don't think there's any reason to recompute the input manifest for subsequent runs.

The point of the manifests is to record everything that goes into a run. Are you adding files to the manifest that aren't actually used? Typically directories were specified in the input section in config.yaml because it was easy and compact, but it's also kinda lazy and not specific. Consequently we've moved to explicitly specifying each input file

https://github.com/ACCESS-NRI/access-om2-configs/blob/release-025deg_jra55_ryf/config.yaml#L40-L52

This has the benefit of being much more specific about what the model needs to run, also any changes to specific input files are more "atomic" and are reflected directly in the config.yaml. Also it means we're calculating hashes only for the files that are used in the simulation.

There are exceptions though, e.g. JRA-55 RYF forcing data has a heap of files, so we use a directory

https://github.com/ACCESS-NRI/access-om2-configs/blob/release-025deg_jra55_ryf/config.yaml#L33

and even more for the IAF version

https://github.com/ACCESS-NRI/access-om2-configs/blob/release-025deg_jra55_iaf/config.yaml#L33-L43