mvukov / rules_ros2

Build ROS 2 with Bazel
Apache License 2.0
79 stars 43 forks source link

Issue with cross compiling rnw_cyclonedds towards aarch64 #144

Open henriksod opened 1 year ago

henriksod commented 1 year ago

Hi, I am using gcc_toolchain to cross compile ros2 nodes towards aarch64 using this repo. I am having an issue with @ros2_rmw_cyclonedds//:rmw_cyclonedds.

How to reproduce:

BAZEL_VERSION=6.2.1

rules_ros2: 89dd5fa0add476e85a438c8575f353ecf6162c57

WORKSPACE ```bazel # WORKSPACE load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") # Toolchain: aarch64-linux-gnueabihf http_archive( name = "aspect_gcc_toolchain", sha256 = "8850373f24d3f8bb6e8f36e3e8e7edc93d948964f8f201e920af2c8ffba2002c", strip_prefix = "gcc-toolchain-4bd1f94536ee92b7c49673931773038d923ee86e", url = "https://github.com/aspect-build/gcc-toolchain/archive/4bd1f94536ee92b7c49673931773038d923ee86e.tar.gz", ) load("@aspect_gcc_toolchain//toolchain:repositories.bzl", "gcc_toolchain_dependencies") gcc_toolchain_dependencies() load("@bazel_skylib//:workspace.bzl", "bazel_skylib_workspace") bazel_skylib_workspace() load("@aspect_bazel_lib//lib:repositories.bzl", "aspect_bazel_lib_dependencies") aspect_bazel_lib_dependencies() load("@aspect_gcc_toolchain//toolchain:defs.bzl", "ARCHS", "gcc_register_toolchain") # Register aarch64 toolchain gcc_register_toolchain( name = "gcc_toolchain_aarch64", target_arch = ARCHS.aarch64, ) http_archive( name = "com_github_mvukov_rules_ros2", sha256 = "c1ff135dd1a6a5c518357285611b1c4de4af6eb9249bf007a21479e35b1a6006", strip_prefix = "rules_ros2-89dd5fa0add476e85a438c8575f353ecf6162c57", url = "https://github.com/mvukov/rules_ros2/archive/89dd5fa0add476e85a438c8575f353ecf6162c57.tar.gz", ) load("@com_github_mvukov_rules_ros2//repositories:repositories.bzl", "ros2_repositories") ros2_repositories() load("@com_github_mvukov_rules_ros2//repositories:deps.bzl", "PIP_ANNOTATIONS", "ros2_deps") ros2_deps() load("@rules_python//python:repositories.bzl", "python_register_toolchains") python_register_toolchains( name = "rules_ros2_python", python_version = "3.8.15", ) load("@rules_python//python:pip.bzl", "pip_parse") load("@rules_ros2_python//:defs.bzl", python_interpreter_target = "interpreter") pip_parse( name = "rules_ros2_pip_deps", annotations = PIP_ANNOTATIONS, python_interpreter_target = python_interpreter_target, requirements_lock = "@com_github_mvukov_rules_ros2//:requirements_lock.txt", ) load( "@rules_ros2_pip_deps//:requirements.bzl", install_rules_ros2_pip_deps = "install_deps", ) install_rules_ros2_pip_deps() ```
.bazelrc ```bazel # .bazelrc build --incompatible_default_to_explicit_init_py build --cxxopt=-std=c++17 build --sandbox_default_allow_network=false build --incompatible_strict_action_env build --heap_dump_on_oom build --noexperimental_check_output_files build:aarch64 --incompatible_enable_cc_toolchain_resolution build:aarch64 --platforms=@aspect_gcc_toolchain//platforms:aarch64_linux ```

Build command:

bazel build --config=aarch64 @ros2_rmw_cyclonedds//:rmw_cyclonedds

Actual result:

ERROR: /home/henrik/.cache/bazel/_bazel_henrik/30c47f5a3d285140f3fa83b1f6b1c678/external/ros2_rmw_cyclonedds/BUILD.bazel:6:16: Linking external/ros2_rmw_cyclonedds/librmw_cyclonedds.so failed: (Exit 1): gcc failed: error executing command (from target @ros2_rmw_cyclonedds//:rmw_cyclonedds) external/gcc_toolchain_aarch64/bin/gcc -shared -o bazel-out/k8-fastbuild/bin/external/ros2_rmw_cyclonedds/librmw_cyclonedds.so ... (remaining 42 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
/home/henrik/.cache/bazel/_bazel_henrik/30c47f5a3d285140f3fa83b1f6b1c678/external/gcc_toolchain_aarch64_files/bin/aarch64-linux-ld: /tmp/librmw_cyclonedds.so.p2msOx.ltrans3.ltrans.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `ddsi_sertype_ops_builtintopic' which may bind externally can not be used when making a shared object; recompile with -fPIC
/tmp/librmw_cyclonedds.so.p2msOx.ltrans3.ltrans.o: in function `new_sertype_builtintopic':
<artificial>:(.text+0x7470): dangerous relocation: unsupported relocation
/home/henrik/.cache/bazel/_bazel_henrik/30c47f5a3d285140f3fa83b1f6b1c678/external/gcc_toolchain_aarch64_files/bin/aarch64-linux-ld: /tmp/librmw_cyclonedds.so.p2msOx.ltrans3.ltrans.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `ddsi_sertopic_serdata_ops_wrap' which may bind externally can not be used when making a shared object; recompile with -fPIC
/tmp/librmw_cyclonedds.so.p2msOx.ltrans3.ltrans.o: in function `ddsi_sertype_from_sertopic':
<artificial>:(.text+0xa9c4): dangerous relocation: unsupported relocation
...

Expected result:

INFO: Analyzed target @ros2_rmw_cyclonedds//:rmw_cyclonedds (12 packages loaded, 703 targets configured).
INFO: Found 1 target...
Target @ros2_rmw_cyclonedds//:rmw_cyclonedds up-to-date:
  bazel-bin/external/ros2_rmw_cyclonedds/librmw_cyclonedds.so
INFO: Elapsed time: 14.028s, Critical Path: 11.99s
INFO: 13 processes: 1 internal, 12 processwrapper-sandbox.
INFO: Build completed successfully, 13 total action

Workaround:

Tried following the error message and adding -fPIC to copts and CMAKE_C_FLAGS, which did not work.

Patching repositories/cyclonedds.BUILD.bazel back to commit c4a1c00a06692b8553a29e2bc6b869dc611eef45 causes the shared library to compile correctly.

mvukov commented 1 year ago

Well, PIC is explicitly enabled in https://github.com/mvukov/rules_ros2/blob/main/repositories/cyclonedds.BUILD.bazel#L74. Should be investigated which flags cmake actually uses -- e.g. in bazel-bin/external/cyclonedds/cyclonedds_foreign_cc/ you should see cmake log(s).

The reason I went for rules_foreign_cc from native Bazel targets is somewhat easier maintenance -- this happened once we added iceoryx support. This is not set in stone and we might revert that.

mvukov commented 1 year ago

You can also investigate which C/C++ flags gcc-toolchain uses -- this might not be in line with the stock cc toolchain.

ahans commented 1 year ago

This sounds like a problem I would enjoy investigating! 😉 Maybe I'll find some time in the next couple of days... But no promises!

ahans commented 1 year ago

The issue seems to be the -flto flag that the CMake-based build adds in Release mode. Change the CMAKE_BUILD_TYPE to Debug or RelWithDebInfo and it links successfully (for RelWithDebInfo they explicitly don't enable INTERPROCEDURAL_OPTIMIZATION). Looks like for the aarch64 gcc that gcc_toolchain uses -flto and -fPIC don't play well together. With the old build it works because there we use the same flags for everything and don't have that CMake/Bazel CC toolchain mismatch.

Not sure how to fix this. Could be a gcc bug. Adding a patch for CycloneDDS, so -flto is never set would probably be a good workaround. Alternatively, change the CMAKE_BUILD_TYPE to RelWithDebInfo. Not sure if that could lead to larger binaries. I think we always do a final Bazel-managed linking step, so that would strip symbols in any case?

ahans commented 1 year ago

This seems to be related: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85801

henriksod commented 1 year ago

I cloned https://github.com/eclipse-cyclonedds/cyclonedds/tree/0.9.1 and ran grep -rnw . -e "flto":

./ports/solaris2.6/config.mk:50:  # OPT = -O3 -DNDEBUG -flto
./src/core/CMakeLists.txt:36:  #-flto and debugging info do not seem to go well together

Doesn't seem like 0.9.1 release has -flto enabled?

ahans commented 1 year ago

I cloned https://github.com/eclipse-cyclonedds/cyclonedds/tree/0.9.1 and ran grep -rnw . -e "flto":

./ports/solaris2.6/config.mk:50:  # OPT = -O3 -DNDEBUG -flto
./src/core/CMakeLists.txt:36:  #-flto and debugging info do not seem to go well together

Doesn't seem like 0.9.1 release has -flto enabled?

The -flto flag is not set manually, but by CMake. They have the INTERPROCEDURAL_OPTIMIZATION target property for this, which in the 0.9.1 release is set here.

You can run bazel build -s --sandbox_debug to get more info about what exactly rules_foreign_cc does and investigate the scripts generated after the build. Then just put VERBOSE=1 somewhere and the CMake build will print the full compiler invocations as well. Then you'll see that -flto is added in a Release build.

thomasegriffith commented 10 months ago

Looking to cross-compile for aarch64 soon... what is the preferred workaround?

Are there any repercussions with "patching repositories/cyclonedds.BUILD.bazel back to commit c4a1c00a06692b8553a29e2bc6b869dc611eef45" (workaround in the OP)??

mvukov commented 10 months ago

Are there any repercussions with "patching repositories/cyclonedds.BUILD.bazel back to commit c4a1c00a06692b8553a29e2bc6b869dc611eef45" (workaround in the OP)??

This is a possibility, but AFAIR that doesn't include support for iceoryx (shared memory backend). Would be nice to eventually work out / update true Bazel build for cyclonedds that also conditionally supports iceoryx (which can be built by bazel).

BTW @mikebauer, I saw that you made this patch: https://github.com/resim-ai/open-core/blob/main/resim/third_party/ros2/flto.patch. Was that maybe related to cross-compilation?

In the end, I think I'll add faq/troubleshooting section to the readme as @ahans suggested and probably add&reference the cyclonedds patch from https://github.com/mvukov/rules_ros2/pull/151. I agree with @ahans that this issue is remotely related to this repo.

thomasegriffith commented 10 months ago

Thanks!