Closed henningkayser closed 3 months ago
I had a look into the open issues in MTC's CI - based on rolling docker images. I observed strange segfaults during shutdown of binaries. It turned out, that two ABI-incompatible versions of liboctomap
are linked into moveit_core's libs (version 1.10.0 from ROS and 1.9.7 (via libfcl) from system). My suggestion to cleanly fix that issue, is to drop the ROS release of octomap and rely on the system version only (see https://github.com/ros/rosdistro/pull/40273#issuecomment-2132260940).
If that proposal is not accepted, at least geometric_shapes
and moveit_core
should build against the system's octomap. To this end, we need to introduce a new rosdep
key for liboctomap-dev
.
To test my hypothesis that octomap was the culprit, I manually created a rolling-source
image where I removed ros-rolling-octomap
after running rosdep install
. I also pushed that image to dockerhub, but this will be overwritten by the next scheduled build of the docker images. So, this needs short-term action!
Tests confirm that the segfaults are fixed: the number of failing tests reduced from 18 to 4 (actually only 2). These two MTC tests that remain failing are both related to another shutdown issue:
cannot publish data, at ./src/rmw_publish.cpp:66 during '__function__'
Fail in delete datareader, at ./src/rmw_service.cpp:89 during '__function__'
Interestingly, Humble does not exhibit that issue. I'm afraid that's a regression in Rolling's rclcpp. No idea, where this originates from. And I need to turn to different tasks next... So, this stays unresolved for now. Any help is welcome ;-)
Once this succeeds, please merge with REBASE so that we keep the commit history. There are commits like dac7638 that should be reverted later on.
Build all succeed now, we still get a bunch of segfaults in the unit test. In particular there is a double free in PILZ when testing under noble
[unittest_trajectory_generator_ptp-1] [ OK ] TrajectoryGeneratorPTPTest.testJointGoalNoStartVel (9 ms)
[unittest_trajectory_generator_ptp-1] [----------] 13 tests from TrajectoryGeneratorPTPTest (140 ms total)
[unittest_trajectory_generator_ptp-1]
[unittest_trajectory_generator_ptp-1] [----------] Global test environment tear-down
[unittest_trajectory_generator_ptp-1] [==========] 13 tests from 1 test suite ran. (140 ms total)
[unittest_trajectory_generator_ptp-1] [ PASSED ] 13 tests.
[unittest_trajectory_generator_ptp-1] double free or corruption (fasttop)
Error: ROR] [unittest_trajectory_generator_ptp-1]: process has died [pid 35681, exit code -11, cmd '/home/runner/work/moveit2/moveit2/.work/target_ws/install/pilz_industrial_motion_planner/lib/pilz_industrial_motion_planner/unittest_trajectory_generator_ptp --ros-args -r __node:=unittest_trajectory_generator_ptp --params-file /tmp/launch_params_pvabl51l --params-file /tmp/launch_params_sw5qgu_0'].
https://github.com/moveit/moveit2/actions/runs/9402842379/job/25897891774?pr=2851#step:11:11618
there is a double free in PILZ when testing under noble
I have fixed that in MoveIt1 and just pushed the corresponding port.
Unfortunately, you force-pushed and thus overwrote my commit for the Pilz double-free...
Unfortunately, you force-pushed and thus overwrote my commit for the Pilz double-free...
Yeah, I just noticed this too late... Thank you, this helps a lot!
I don't think 48f60c4 had anything to do with the double free, unfortunately. The same issue has been fixed a while ago in #1229 already. Oddly, the tests produce segfaults (in CI and also locally) with the compile guard, and since there is no active distro that still provides kdl <1.4.0 I'll remove this switch. All pilz tests pass locally for me without any issues on clang+jazzy with both of the two latest commits. However, the main issue left is that CI will still install the ros-octomap package, regardless if it's provided in the docker image or not. I don't think there is a way around a proper version guard in the cmake setup for geometric_shapes and moveit
well, at least jazzy is happy now. tutorials will require fixing of ros2_kortex #2864 and the rolling jobs fail for spurious clang warnings and codecov mismatches. So far moveit_msgs and geometric_shapes are already released, which also includes the version fix for octomap #2862.
rolling with clang-tidy should be happy now as well (there were several new fixes). The only broken job remaining is the coverage one. Note that I already reverted several temporary commits. Please cleanup the commit history before merging.
rolling with clang-tidy should be happy now as well (there were several new fixes). The only broken job remaining is the coverage one. Note that I already reverted several temporary commits. Please cleanup the commit history before merging.
Thank you! I was planning to do a fresh cherry-pick of all the things that we keep, and squash it into concise blocks in a fresh branch.
I'm switching off ccov for now (https://github.com/moveit/moveit2/issues/2866) and will reopen this PR with a new branch to unblock CI and releases.
building jazzy docker images. CI: https://github.com/moveit/moveit2/actions/runs/9226839321/job/25387491876 Source: https://github.com/moveit/moveit2/actions/runs/9226839321/job/25387715948 Release: https://github.com/moveit/moveit2/actions/runs/9226839321/job/25387492335 all succeeded!