Open maxime-clem opened 11 months ago
Interesting, because as far as I know, the composable node containers should be purely C++ and shouldn't have any interactions with pybind11
. This happens on the first pybind11 node that you load or on subsequent ones?
The issue happens even without using a composable node containers (can be reproduced by directly running the node executable).
Since the issue can be solved by using dlopen
, it seems to be a linker issue but I do not see any reason why the link to the python library would be different between a Node
and a ComposableNode
.
I'm not 100% sure of this, but the situation seems similar to https://github.com/PyO3/pyo3/issues/2000#issuecomment-979479111 , which leads to https://bugs.python.org/issue21536 . There, they discuss some of the ins and outs of loading things dynamically with Python. In particular, I'll point to this comment where they say:
"IHMO it's a bad usage of dlopen(): libpython must always be loaded with RTLD_GLOBAL."
I then took a look at how we loaded libraries, and saw this: https://github.com/ros2/rcutils/blob/d3fed35f2d8e19dede7f6dfd5f3b862c40ac7809/src/shared_library.c#L97
Indeed, locally if I switch that to RTLD_LAZY | RTLD_GLOBAL
, the example that @maxime-clem provided works.
So the question is: should we add in RTLD_GLOBAL
? It fixes the issue, but I'm slightly concerned about other side-effects it might have. @mjcarroll thoughts?
Bug report
Required Info:
I am trying to call the python interpreter from a
ComposableNode
. I have no issue doing a simpleprint()
, but if I try to doimport torch
, the program crashes with anundefined symbol
error.There is no issue when doing this with a normal
Node
.Steps to reproduce issue
I made a minimal example showcasing the issue in the following repository: https://github.com/maxime-clem/ros2_composable_node_bug
Expected behavior
Composable node can use the python library without issue, similarly to a normal
Node
.Actual behavior
Composable node crashes with an
undefined symbol: PyTuple_Type
error.Additional information
I have confirmed the issue with another user so it does not appear to be en environment issue. The only workaround found so far is to use
dlopen("libpython3.10.so", RTLD_GLOBAL | RTLD_NOW)
in the code of theComposableNode
.