Given that .cu files are provided, LTO is applied to them as well and as far as I know CUDA does not yet fully support device code link optimization. The end result is a runtime error stating that the provided module has an undefined symbol: fatbinData (i.e. it is missing the device code completely).
In addition: tested with CUDA 10.2 and CUDA 11.6 out of which both versions experienced this. However, considerable changes to how CMake handles CUDA were made in CMAKE 3.18 and calling pybind11_add_module for CMAKE 3.17 works given that CMAKE_CUDA_ARCHITECTURES changed functionality.
It is not an error in pybind11 per se, but since CUDA does not fully support LTO for device code (as far as I know) it should not be applied as a default for .cu sources.
I am not certain about how pybind11 wants to address this. Regardless, once decide upon an action, I am happy to contribute a PR. Perhaps too much CUDA-specific stuff in pybind11 is not feasible.
Related issues to CUDA and CMAKE are found at these links:
1234
@eyalroz, seems to have experienced issues with CMake and CUDA related issues. Would you happen to know about LTO for device code in this setting?
Reproducible example code
Use `pybind11_add_module` instead of `add_library` to build a .so lib.
add_library(${MODULENAME} MODULE ${CPPSOURCES} ${CUSOURCES})
vs
pybind11_add_module(${MODULENAME} ${CPPSOURCES} ${CUSOURCES})
### Is this a regression? Put the last known working version here if it is.
Not a regression
Required prerequisites
What version (or hash if on master) of pybind11 are you using?
2.11.1
Problem description
The CMake helper
pybind11_add_module
applies LTO by default (source):Given that
.cu
files are provided, LTO is applied to them as well and as far as I know CUDA does not yet fully support device code link optimization. The end result is a runtime error stating that the provided module has anundefined symbol: fatbinData
(i.e. it is missing the device code completely).In addition: tested with
CUDA 10.2
andCUDA 11.6
out of which both versions experienced this. However, considerable changes to how CMake handles CUDA were made inCMAKE 3.18
and callingpybind11_add_module
forCMAKE 3.17
works given thatCMAKE_CUDA_ARCHITECTURES
changed functionality.It is not an error in pybind11 per se, but since CUDA does not fully support LTO for device code (as far as I know) it should not be applied as a default for
.cu
sources.I am not certain about how pybind11 wants to address this. Regardless, once decide upon an action, I am happy to contribute a PR. Perhaps too much CUDA-specific stuff in pybind11 is not feasible.
Related issues to CUDA and CMAKE are found at these links: 1 2 3 4
@eyalroz, seems to have experienced issues with CMake and CUDA related issues. Would you happen to know about LTO for device code in this setting?
Reproducible example code