payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
18 stars 26 forks source link

Passing entire environment causes error with payu run numbering #227

Closed aidanheerdegen closed 4 years ago

aidanheerdegen commented 4 years ago

There were issues with porting payu to the new NCI HPC system gadi, as it uses a newer version of the module system, documented in #209 #211

One work-around was to export all the current environment to the PBS job by adding -V to qsub_flags.

This solved the immediate issue. Another user was having problems with payu trying to re-run the just completed run when invoked with -n. The error was sporadic and difficult to reproduce, but it turns out it was due to exporting the current environment to the next run. As payu uses environment variables to do inter-process communication of information like the current run number, exporting the entire environment to the PBS process was a bad idea.

I have made this issue to simply document the problem with this approach.