pybind / pybind11

Seamless operability between C++11 and Python
https://pybind11.readthedocs.io/
Other
15.18k stars 2.06k forks source link

[BUG]: LTO applied to all .cu files #4825

Open mikaeltw opened 10 months ago

mikaeltw commented 10 months ago

Required prerequisites

What version (or hash if on master) of pybind11 are you using?

2.11.1

Problem description

The CMake helper pybind11_add_module applies LTO by default (source):

  if(NOT DEFINED CMAKE_INTERPROCEDURAL_OPTIMIZATION)
    if(ARG_THIN_LTO)
      target_link_libraries(${target_name} PRIVATE pybind11::thin_lto)
    else()
      target_link_libraries(${target_name} PRIVATE pybind11::lto)
    endif()
  endif()

Given that .cu files are provided, LTO is applied to them as well and as far as I know CUDA does not yet fully support device code link optimization. The end result is a runtime error stating that the provided module has an undefined symbol: fatbinData (i.e. it is missing the device code completely).

In addition: tested with CUDA 10.2 and CUDA 11.6 out of which both versions experienced this. However, considerable changes to how CMake handles CUDA were made in CMAKE 3.18 and calling pybind11_add_module for CMAKE 3.17 works given that CMAKE_CUDA_ARCHITECTURES changed functionality.

It is not an error in pybind11 per se, but since CUDA does not fully support LTO for device code (as far as I know) it should not be applied as a default for .cu sources.

I am not certain about how pybind11 wants to address this. Regardless, once decide upon an action, I am happy to contribute a PR. Perhaps too much CUDA-specific stuff in pybind11 is not feasible.

Related issues to CUDA and CMAKE are found at these links: 1 2 3 4

@eyalroz, seems to have experienced issues with CMake and CUDA related issues. Would you happen to know about LTO for device code in this setting?

Reproducible example code

Use `pybind11_add_module` instead of `add_library` to build a .so lib.

add_library(${MODULENAME} MODULE ${CPPSOURCES} ${CUSOURCES})
vs
pybind11_add_module(${MODULENAME} ${CPPSOURCES} ${CUSOURCES})


### Is this a regression? Put the last known working version here if it is.

Not a regression