Open Neves-P opened 3 months ago
I've tried to install this version, but it's failing with errors like:
nvfortran-Error-A CUDA toolkit matching the current driver version (0) or a supported older version (11.8) was not installed with this HPC SDK.
The currently used toolchain (nvofbf/2022.07
) uses CUDA 11.7, looks like we need one with a more recent CUDA version. This is not available yet, though there is an open PR for 2023.01: https://github.com/easybuilders/easybuild-easyconfigs/pull/20716. We could give that a try.
Based on a test build, this should indeed work with the newer toolchain. So I've modified all easyconfigs and patches in this PR accordingly, and started builds for Wannier for all 4 CPU types. The resulting tarballs can be ingested to CVMFS, and afterwards VASP can be built with -r
in order to install it to the restricted /apps area.
The builds of Wannier90 have succeeded for all CPUs using:
./build_container.sh -o /scratch/public/software-tarballs -- eb -r --force --from-pr=20265,20269,20716 Wannier90-3.1.0-nvofbf-2023.01.eb
I've ingested the tarballs and I'm now trying to build VASP.
Submitted VASP builds jobs for all CPU types except zen3, and they all failed with different errors...
With all dependencies already in place, you should be able to start the build with (from-pr is still needed as it needs the toolchain definitions from (some of) those PRs):
./build_container.sh -r -o /scratch/public/software-tarballs -- eb -r --force --from-pr=20265,20269,20716 VASP-6.4.3-nvofbf-2023.01.eb
(see the vasp.sh on the shared account)
From the build log with --debug
on zen3
:
== 2024-08-15 10:13:01,761 run.py:689 DEBUG cmd "ldd /cvmfs/hpc.rug.nl/versions/2023.01/rocky8/x86_64/amd/zen3/software/VASP/6.4.3-nvofbf-2023.01/bin/vasp_gam" exited with exit code 0 and output:
...
libnvc.so => /cvmfs/hpc.rug.nl/versions/2023.01/rocky8/x86_64/amd/zen3/software/NVHPC/23.1-CUDA-12.0.0/Linux_x86_64/23.1/compilers/lib/libnvc.so (0x00007f3687d20000)
librt.so.1 => /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/librt.so.1 (0x00007f3687b16000)
libc.so.6 => /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libc.so.6 (0x00007f3687751000)
libgcc_s.so.1 => /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libgcc_s.so.1 (0x00007f3687539000)
libm.so.6 => /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libm.so.6 (0x00007f36871b7000)
libatomic.so.1 => not found
libatomic.so.1 => not found
...
nvofbf
uses the system
toolchain as a basis and it seems like RL8 doesn't have one of the required shared libraries libatomic.so.1
.
However, the required version of the NVHPC
toolchain loads GCCcore/12.2.0, which does have this library
ls /cvmfs/hpc.rug.nl/versions/2023.01/rocky8/x86_64/amd/zen3/software/GCCcore/12.2.0/lib64/ | grep libatomic.so.1
libatomic.so.1
libatomic.so.1.2.0
This is a work in progress installation from I2407-02742
We still need to obtain the sources and build the software, then ingest the tarballs. I have tested this using v6.4.2, but it could be that the updated version fails. I don't expect major problems since the install instructions on the wiki did not change as far as I can see.