Linking issue with libpthread

ghost commented 1 year ago

This is something that apparently occurs on modern Ubuntu systems. Understanding of this issue has tested my knowledge but I think the fix/workaround is worth documenting.

I am building on the following system (WSL Ubuntu 22.04 LTS):

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.2 LTS
Release:        22.04
Codename:       jammy

Installing with CUDA installed via sudo apt-get install nvidia-cuda-toolkit

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

I get the following error.

[ 52%] Linking CUDA device code CMakeFiles/parallelproj_cuda.dir/cmake_device_link.o
nvlink warning : Skipping incompatible '/lib/x86_64-linux-gnu/librt.a' when searching for -lrt
nvlink warning : Skipping incompatible '/usr/lib/x86_64-linux-gnu/librt.a' when searching for -lrt
nvlink warning : Skipping incompatible '/lib/x86_64-linux-gnu/libpthread.a' when searching for -lpthread
nvlink warning : Skipping incompatible '/usr/lib/x86_64-linux-gnu/libpthread.a' when searching for -lpthread
nvlink warning : Skipping incompatible '/lib/x86_64-linux-gnu/libdl.a' when searching for -ldl
nvlink warning : Skipping incompatible '/usr/lib/x86_64-linux-gnu/libdl.a' when searching for -ldl
nvlink fatal   : Could not open input file '/usr/lib/x86_64-linux-gnu/libpthread.a'
make[2]: *** [cuda/CMakeFiles/parallelproj_cuda.dir/build.make:211: cuda/CMakeFiles/parallelproj_cuda.dir/cmake_device_link.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:985: cuda/CMakeFiles/parallelproj_cuda.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

I do not fully understand this issue but it related to /usr/lib/x86_64-linux-gnu/libpthread.a being only 8 bytes with modern version of libc?

The best description I found was here: https://matsci.org/t/lammps-users-kokkos-linker-error-nvidia-libdl-a/41050

The fix I implemented for this was to change OpenMP_pthread_LIBRARY from /usr/lib/x86_64-linux-gnu/libpthread.a to /usr/lib/x86_64-linux-gnu/libpthread.so.0

This allows parallelproj to build and the tests pass.

Any input from @gschramm or @KrisThielemans would be appreciated.

gschramm commented 1 year ago

Interesting, at a first glance I have no idea why that is happening.

I just tried on my Ubuntu 22.04 system with cuda (11.8) installed from the ubuntu repo and I cannot reproduce the issue.

$lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.2 LTS
Release:    22.04
Codename:   jammy

$nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

@Robert-PrescientImaging can you try with CUDA 11.8 as well? In the conda-forge builds using cuda 10, 11.0, 11.1, 11.2, I have also never seen this issue.

KrisThielemans commented 1 year ago

Another post https://forums.developer.nvidia.com/t/nvlink-fatal-could-not-open-input-file-when-linking-with-empty-static-library/208517/7 claims that this was fixed with cuda 11.7

gschramm commented 1 year ago

Good to know. @KrisThielemans any idea why this is not a problem for the conda-forge builds with cuda < 11.7?

KrisThielemans commented 1 year ago

of course I don't! 😄 A guess: conda comes with its own compiler and libraries on Linux I believe, so maybe they did something else than Ubuntu.

gschramm commented 1 year ago

Hehe. So should we just encourage Ubuntu users to use CUDA >= 11.7 then?

ghost commented 1 year ago

Another post https://forums.developer.nvidia.com/t/nvlink-fatal-could-not-open-input-file-when-linking-with-empty-static-library/208517/7 claims that this was fixed with cuda 11.7

Upgrading to CUDA 12.1 resolved the issue.

ghost commented 1 year ago

Hehe. So should we just encourage Ubuntu users to use CUDA >= 11.7 then?

You may want to but unless anyone else has this particularly niche issue, I suggest leaving this thread as documentation to refer to.

KrisThielemans commented 1 year ago

Just had the same issue for someone with cuda 11.5. (He also had another problem for Gadgetron which has been fixed in later cuda versions).

gschramm commented 1 year ago

ok. just added a note + reference to this issue in the README for ubuntu users

gschramm / parallelproj

Linking issue with libpthread #24