rug-cit-hpc / cit-hpc-easybuild

EasyBuild files, which are used on our local facilities
GNU General Public License v2.0
0 stars 2 forks source link

[WIP] VASP 6.4.3 and Wannier90 nvofbf-2022.7 #75

Open Neves-P opened 3 months ago

Neves-P commented 3 months ago

This is a work in progress installation from I2407-02742

We still need to obtain the sources and build the software, then ingest the tarballs. I have tested this using v6.4.2, but it could be that the updated version fails. I don't expect major problems since the install instructions on the wiki did not change as far as I can see.

bedroge commented 3 months ago

I've tried to install this version, but it's failing with errors like:

nvfortran-Error-A CUDA toolkit matching the current driver version (0) or a supported older version (11.8) was not installed with this HPC SDK.

The currently used toolchain (nvofbf/2022.07) uses CUDA 11.7, looks like we need one with a more recent CUDA version. This is not available yet, though there is an open PR for 2023.01: https://github.com/easybuilders/easybuild-easyconfigs/pull/20716. We could give that a try.

bedroge commented 3 months ago

Based on a test build, this should indeed work with the newer toolchain. So I've modified all easyconfigs and patches in this PR accordingly, and started builds for Wannier for all 4 CPU types. The resulting tarballs can be ingested to CVMFS, and afterwards VASP can be built with -r in order to install it to the restricted /apps area.

bedroge commented 3 months ago

The builds of Wannier90 have succeeded for all CPUs using:

./build_container.sh -o /scratch/public/software-tarballs -- eb -r --force --from-pr=20265,20269,20716 Wannier90-3.1.0-nvofbf-2023.01.eb

I've ingested the tarballs and I'm now trying to build VASP.

bedroge commented 3 months ago

Submitted VASP builds jobs for all CPU types except zen3, and they all failed with different errors...

With all dependencies already in place, you should be able to start the build with (from-pr is still needed as it needs the toolchain definitions from (some of) those PRs):

./build_container.sh -r -o /scratch/public/software-tarballs -- eb -r --force --from-pr=20265,20269,20716 VASP-6.4.3-nvofbf-2023.01.eb

(see the vasp.sh on the shared account)

Neves-P commented 3 months ago

From the build log with --debug on zen3:

== 2024-08-15 10:13:01,761 run.py:689 DEBUG cmd "ldd /cvmfs/hpc.rug.nl/versions/2023.01/rocky8/x86_64/amd/zen3/software/VASP/6.4.3-nvofbf-2023.01/bin/vasp_gam" exited with exit code 0 and output:
...
        libnvc.so => /cvmfs/hpc.rug.nl/versions/2023.01/rocky8/x86_64/amd/zen3/software/NVHPC/23.1-CUDA-12.0.0/Linux_x86_64/23.1/compilers/lib/libnvc.so (0x00007f3687d20000)
        librt.so.1 => /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/librt.so.1 (0x00007f3687b16000)
        libc.so.6 => /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libc.so.6 (0x00007f3687751000)
        libgcc_s.so.1 => /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libgcc_s.so.1 (0x00007f3687539000)
        libm.so.6 => /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libm.so.6 (0x00007f36871b7000)
        libatomic.so.1 => not found
        libatomic.so.1 => not found
...

nvofbf uses the system toolchain as a basis and it seems like RL8 doesn't have one of the required shared libraries libatomic.so.1.

Neves-P commented 3 months ago

However, the required version of the NVHPC toolchain loads GCCcore/12.2.0, which does have this library

ls /cvmfs/hpc.rug.nl/versions/2023.01/rocky8/x86_64/amd/zen3/software/GCCcore/12.2.0/lib64/ | grep libatomic.so.1
libatomic.so.1
libatomic.so.1.2.0