Currently the loading of the appropriate setup environment for HAFS runtime is handled by a script named hafs_pre_job.sh.inc. This script uses a very ad-hoc method of attempting to automatically determine what machine the script is being run on, and load the modules and other needed environment variables prior to any HAFS job running.
This is an inflexible and fragile system. The system may fail to build and/or run at any point due to seemingly innocuous system changes outside our control. For example, it seems like no one has even noticed that the check on Jet depends on the existence of /lfs4, a directory that has been deprecated and, according to emails from RDHPCS, was due to be unmounted a month ago. When it does get unmounted, the develop branch of HAFS will fail to build or run on Jet in any capacity.
Furthermore, this system does not seem to have any benefit whatsoever: users know what machine they are working on, why should there not be a simple variable that is set at build time so that the appropriate environment will be loaded based on concrete, user input?
Proposed solution
With regards to that last question, system.conf files already contain a WHERE_AM_I variable that defines the name of the platform (with some weird circular definitions where it gets overwritten by the existing logic, another bad practice!). We should use that to load the appropriate environment at runtime, rather than try dark magic wizardry with directories and hostnames. For the build system, an additional argument to ./install_hafs.sh is all that's needed, as is done in the SRW app.
Status (optional)
I have already incorporated the changes into a working branch, which was very easy to do. I will open a PR provided there is nothing major I am missing. However, this is only the HAFS top-level controls, there are still other scripts just like this that need to be deprecated in submodules, such as gfdl-tracker and hafs_utils.
Description
Currently the loading of the appropriate setup environment for HAFS runtime is handled by a script named
hafs_pre_job.sh.inc
. This script uses a very ad-hoc method of attempting to automatically determine what machine the script is being run on, and load the modules and other needed environment variables prior to any HAFS job running.This is an inflexible and fragile system. The system may fail to build and/or run at any point due to seemingly innocuous system changes outside our control. For example, it seems like no one has even noticed that the check on Jet depends on the existence of
/lfs4
, a directory that has been deprecated and, according to emails from RDHPCS, was due to be unmounted a month ago. When it does get unmounted, the develop branch of HAFS will fail to build or run on Jet in any capacity.Furthermore, this system does not seem to have any benefit whatsoever: users know what machine they are working on, why should there not be a simple variable that is set at build time so that the appropriate environment will be loaded based on concrete, user input?
Proposed solution
With regards to that last question,
system.conf
files already contain aWHERE_AM_I
variable that defines the name of the platform (with some weird circular definitions where it gets overwritten by the existing logic, another bad practice!). We should use that to load the appropriate environment at runtime, rather than try dark magic wizardry with directories and hostnames. For the build system, an additional argument to./install_hafs.sh
is all that's needed, as is done in the SRW app.Status (optional)
I have already incorporated the changes into a working branch, which was very easy to do. I will open a PR provided there is nothing major I am missing. However, this is only the HAFS top-level controls, there are still other scripts just like this that need to be deprecated in submodules, such as gfdl-tracker and hafs_utils.