mvukov / rules_ros2

Build ROS 2 with Bazel
Apache License 2.0
81 stars 45 forks source link

Refactor/exported symbols #185

Closed mvukov closed 5 months ago

mvukov commented 11 months ago

I started this effort inspired by https://github.com/mvukov/rules_ros2/discussions/180 (where I learned about dynamic symbol exports) and https://github.com/mvukov/rules_ros2/pull/148.

Here are some thoughts:

In Bazel context and different compiler/linker combos we need to handle properly gcc/clang linking, ros 2 rmw serdes intricacies.

This PR aims to nail all those problems. We basically have two options:

The part that I don't like much is that in any case we need to every "main" Python script the os.RTLD_GLOBAL dlopen flag for this to work; got inspiration from rosbag2 tests. This may not be the nicest solution but it works. If we decide to go this route this will be properly documented.

@ahans @mikebauer @kgreenek @lalten I'd like to hear your opinions.

brians-neptune commented 6 months ago

Seems to me like you have to patch every Python binary to load plugins like this from Python with libc++, independent of rules_ros2, correct? If I'm correct that this is a fundamental restriction of that combination of things outside of this project, then I think adding that so CI can test libc++ and telling users that they will have to do the same is fine.

From past experience with libc++ incompatibilities, I think there just aren't very many people using libc++ in complex open source projects. I think there's much more usage in proprietary codebases (which have no problem patching Python to add the flag everywhere), and leaf dependencies that just want run msan/etc (which don't run into problems like this).

Also, for reference https://python-dev.python.narkive.com/P6UYSCLF/problems-with-python-s-default-dlopen-flags is some (only vaguely civil) discussion around the problems RTLD_GLOBAL will cause with some other libraries. I think the answer if anybody wants to use ROS2, libc++, and one of those annoying libraries is "don't". C++ is much less prone to that class of problems due to name mangling too.

mikebauer commented 6 months ago

As my interest is mainly on the C++ side, I don't have much commentary on the python patching stuff, but the change of adding "-Wl,--export-dynamic" seems reasonable to me as discussed in https://github.com/mvukov/rules_ros2/discussions/180. Thanks for your efforts in doing further research on potential downsides. No objections from me here!

mvukov commented 6 months ago

clang+libc++ issue(s) just served me to investigate the mentioned problems above. Issues with handling global symbol are very relevant for other compiler and libc combinations. This PR basically aims to nail all those issues.

On the C++ side, users won't have to do a thing: I already added linker flag at relevant places.

But, yes, all Python binaries will need be patched to have something like sys.setdlopenflags(os.RTLD_GLOBAL | os.RTLD_NOW) before any ros related code is imported. Personally, I don't consider this to be a big issue, given the issues this PR solves on the C++ side. Having to patch third-party ROS 2 Python binaries might be an annoyance, that's a fair point.

brians-neptune commented 5 months ago

I've been diving deep into something that I only just realized is related to this. I've got another solution, which I think is better: make it so rosidl_typesupport_cpp::typesupport_identifier etc are only linked once. I'm close to having that implemented (and having permission from my employer to share it). All the existing tests pass with the strcmp patches removed with GCC.

Linking any symbol into multiple ELF libraries (or multiple times into a single ELF library) is kinda-sorta a violation of the One Definition Rule. For things like string constants it mostly works (except for having multiple pointer values), but I've had more complex global variables give weird segfaults from doing that before. Also linking things multiple times bloats the output binaries/shared libraries significantly.

Using cc_shared_library and rearranging the cc_common rules was harder than it looked, but I'm pretty sure I've got it working. I wrote a little test utility to look through all the binaries and shared objects in a runfiles tree and check for duplicate symbols, which passes with my changes.

What should I run to check if my changes fix everything you're aiming to fix here? I'm not sure if bazel test --config=clang //... is enough or if there are more changes to enable libc++.

mvukov commented 5 months ago

Hi, very nice! Thanks for the efforts. I think we covered regular and edge cases well enough with tests. So, if tests compiled with gcc and clang pass both in the root of the repo and for examples, we're good.

mvukov commented 5 months ago

@brians-neptune Following your comments and suggestions I dig deeper into this matter and took another shot at this: please take a look at #299. That one provides a working solution that is minimal, transparent for users and looks like it works both for gcc and clang.

mvukov commented 5 months ago

If #299 works as expected, I'll just close this PR (for good :)).

mvukov commented 5 months ago

Closing this one in favor of #299 .