Closed harshula closed 1 year ago
Are you calling payu run
from the config directory ($SCRATCH/access-om2/work/1deg_jra55_ryf_spackv1.git
)?
Sorry, no. It's executed in: $HOME/payu/1deg_jra55_ryf_spackv1.git
$HOME/payu/1deg_jra55_ryf_spackv1.git/work is a symlink to $SCRATCH/access-om2/work/1deg_jra55_ryf_spackv1.git
Do things work if you run from $SCRATCH/access-om2/work/1deg_jra55_ryf_spackv1.git
?
Running from the "work" directory results in the job disappearing and a file (1deg_jra55_ryf.e86040237) with:
FileNotFoundError: [Errno 2] No such file or directory: '$SCRATCH/access-om2/work/1deg_jra55_ryf_spackv1.git/config.yaml
My understanding is that the "work" directory is temporary.
I'm following these instructions: https://github.com/COSIMA/access-om2/wiki/Getting-started#building-the-models
I think @aidanheerdegen tracked it down to https://github.com/payu-org/payu/blob/a771fe7447ee19fd123b07414ddca64f95dabf5a/payu/experiment.py#L517-L520
My understanding is that the "work" directory is temporary.
Yes, my apologies, I wasn't reading your paths carefully enough
Nor was I, when I started answering your original question! :-)
I think @aidanheerdegen tracked it down
I believe the issue is with the introspection payu
uses to determine the correct mpirun
options as linked above it looks for libmpi.so
in the linked libraries to determine the mpi_module
type and version used, and then uses this value to determine the command line argument options to mpirun
:
Does this fail for spack
builds @harshula? If so, what logic would we have to add to support spack
built executables directly?
I'll come back to this once openmpi is sorted. A more general question, is there a way to override Payu's heuristics via the config file? In this instance, can we force Payu to insert -wdir
via an option in the config file?
Yes there is, indirectly. By adding something like this to config.yaml
:
mpi:
module: openmpi/4.1.0
(Sorry, drafted this days ago and didn't "send")
Doesn't that result in the system/gadi openmpi being used at runtime instead of the Spack built openmpi?
Yep. It is a work-around, so not appropriate in some circumstances. Definitely should just fix the introspection stuff to either detect this correctly, or just default to openmpi so there is at least something appropriate.
Notes
def lib_update(bin_path, lib_name):
# Local import to avoid reversion interference
# TODO: Bad design, fixme!
# NOTE: We may be able to move this now that reversion is going away
from payu import fsops
# TODO: Use objdump instead of ldd
cmd = 'ldd {0}'.format(bin_path)
ldd_output = subprocess.check_output(shlex.split(cmd)).decode('ascii')
slibs = ldd_output.split('\n')
for lib_entry in slibs:
if lib_name in lib_entry:
lib_path = lib_entry.split()[2]
# pylint: disable=unbalanced-tuple-unpacking
BUG >> mod_name, mod_version = fsops.splitpath(lib_path)[2:4]
module('unload', mod_name)
module('load', os.path.join(mod_name, mod_version))
return '{0}/{1}'.format(mod_name, mod_version)
# If there are no libraries, return an empty string
return ''
The code is expecting the line:
libmpi.so.40 => /apps/openmpi/4.0.2/lib/libmpi.so.40
but receives the line:
libmpi.so.40 => $HOME/spack-microarchitectures.git/opt/spack/linux-rocky8-cascadelake/intel-2019.5.281/openmpi-4.1.5-ooyg5wc7sa3tvmcpazqqb44pzip3wbyo/lib/libmpi.so.40 (0x000014a7cabad000)
This is the override mechanism that @aidanheerdegen mentioned earlier:
mpi_config = self.config.get('mpi', {})
mpi_module = mpi_config.get('module', None)
We could extend this to be more flexible.
[Updated: 28/07/2023]
Requirements
-wdir
.module load
openmpi version from /apps.Notes A general solution to this type of problem is to create a function (e.g. https://github.com/harshula/payu/compare/master...harshula:payu:spack) that creates a data structure of all the required libraries per binary. Ideally this data structure should be initialised when the model object is instantiated to allow any function to access the data without requiring additional system calls and subsequent filesystem reads.
If openmpi is required and spack's version is required, then module load openmpi version from /apps. We haven't tuned Spack's openmpi, yet.
@Harshula is this still the case or is it now ok to use spacks openmpi and not load the local ncis/apps version?
Sorry, I should have updated the requirements. I'll update them now. The reason why this requirement is not relevant is here: https://github.com/ACCESS-NRI/ACCESS-OM/issues/6#issuecomment-1620953535
When testing a Spack build of
access-om2
using Payu, I was receiving the following errors:I noticed that
-wdir
is missing from the arguments given tompirun
:mpirun --mca io ompio --mca io_ompio_num_aggregators 1 -np 1 $SCRATCH/access-om2/work/1deg_jra55_ryf_spackv1.git/atmosphere/yatm.exe : -np 216 $SCRATCH//access-om2/work/1deg_jra55_ryf_spackv1.git/ocean/fms_ACCESS-OM.x : -np 24 $SCRATCH/access-om2/work/1deg_jra55_ryf_spackv1.git/ice/cice_auscom_360x300_24x1_24p.exe