mumax / 3

GPU-accelerated micromagnetic simulator
Other
447 stars 150 forks source link

Error from compilation #326

Closed jedcheng closed 11 months ago

jedcheng commented 11 months ago

I am attempting to compile mumax3 from source using CUDA 12.

After using the make file with

make CUDA_CC=80

I successfully compiled the binaries

However, when I ran it, it returns an error: ./mumax3: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory

I found the libcuda.so.1 file in the directory /gpfs/easybuild/prod/software/CUDA/12.0.0/lib64/stubs

I suspect that it should be put in lib64 directly instead of lib64/stubs

Is there any possibilities that the make file could be modified to link the file in lib64/stubs instead?

MathieuMoalic commented 11 months ago

Hi,

It seems the dynamic linker is unable to locate the CUDA drivers during runtime, even though, as you mentioned, it likely compiled successfully. Given that you're using GPFS, I assume you're compiling on an HPC. The CUDA drivers might be located in a non-standard directory in this setup. You might want to try loading the correct module, perhaps with module load cuda12.0 (though the exact command will depend on your system). Another solution would be to place the shared object in the same directory as the mumax binary; the linker (ld) should then recognize it. If that doesn't work, ldconfig might be of help.

It might be beneficial to rename this issue for clarity, perhaps to "Linker Error with CUDA 12.0".

jedcheng commented 11 months ago

Thank you for the comments. This is a completely new HPC cluster. After some more testing, nothing that uses CUDA runs at all (stuck at initialization without any error messages). Even the tf and Jax installed don't.

Since the admin is severely understaffed, it will take some time for them to resolve it. I'll update the issue after resolving it for future reference.