mochi-hpc-experiments / mochi-tests

Test cases for the Mochi project.
Other
0 stars 6 forks source link

misc theta-gpu job script cleanups #58

Closed carns closed 1 year ago

carns commented 1 year ago

@jhendersonHDF and @vchoi-hdfgroup does this look Ok to you? I'm trying to get it running and got closer after these changes, but then I still hit errors that look like this before it finishes installing everything:

==> Error: AttributeError: Query of package 'openmpi' for 'headers' failed
        prefix : None
        spec : openmpi@4.0.5%gcc@9.4.0~atomics~cuda~cxx~cxx_exceptions~gpfs~internal-hwloc~java~legacylaunchers~lustre~memchecker+romio+rsh~singularity+static+vt+wrapper-rpath build_system=autotools fabrics=none patches=60ce20b schedulers=none arch=linux-ubuntu20.04-zen2
        queried as : openmpi
        extra parameters : []

The 'mochi-ssg' package cannot find an attribute while trying to build from sources. This might be due to a change in Spack's package format to support multiple build-systems for a single package. You can fix this by updating the build recipe, and you can also report the issue as a bug. More information at https://spack.readthedocs.io/en/latest/packaging_guide.html#installation-procedure
==> Error: mochi-ssg-develop-pkmj42xollb5zhbxervo22vlt3rln36o: AttributeError: Query of package 'openmpi' for 'headers' failed
        prefix : None

Possibly this is a problem triggered by a recent spack update if you are not seeing it.

jhendersonHDF commented 1 year ago

Looks good to me as far as my familiarity with spack goes. I'm not sure on the MPI issue though. So far I've only run things on Polaris and there I often run into issues with needing to load the CUDA modules. I'm not sure if that's also the case for theta-gpu, but https://github.com/spack/spack/issues/12520 seemed to have the same issue.

carns commented 1 year ago

Interesting; thanks for the pointer to the spack issue. I don't think the explicit cuda modules are needed on Polaris, but from the discussion on that issue you linked to, it looks like this might potentially be solvable by specifying the prefix in addition to the module for the external OpenMPI package. I'm going to go ahead and merge the changes in this PR (they are orthogonal I believe) and try out that workaround.