moveit / moveit2

:robot: MoveIt for ROS 2
https://moveit.ai/
BSD 3-Clause "New" or "Revised" License
1.04k stars 512 forks source link

Jazzy CI and build status #2851

Closed henningkayser closed 3 months ago

henningkayser commented 4 months ago

building jazzy docker images. CI: https://github.com/moveit/moveit2/actions/runs/9226839321/job/25387491876 Source: https://github.com/moveit/moveit2/actions/runs/9226839321/job/25387715948 Release: https://github.com/moveit/moveit2/actions/runs/9226839321/job/25387492335 all succeeded!

rhaschke commented 4 months ago

I had a look into the open issues in MTC's CI - based on rolling docker images. I observed strange segfaults during shutdown of binaries. It turned out, that two ABI-incompatible versions of liboctomap are linked into moveit_core's libs (version 1.10.0 from ROS and 1.9.7 (via libfcl) from system). My suggestion to cleanly fix that issue, is to drop the ROS release of octomap and rely on the system version only (see https://github.com/ros/rosdistro/pull/40273#issuecomment-2132260940).

If that proposal is not accepted, at least geometric_shapes and moveit_core should build against the system's octomap. To this end, we need to introduce a new rosdep key for liboctomap-dev.

To test my hypothesis that octomap was the culprit, I manually created a rolling-source image where I removed ros-rolling-octomap after running rosdep install. I also pushed that image to dockerhub, but this will be overwritten by the next scheduled build of the docker images. So, this needs short-term action!

Tests confirm that the segfaults are fixed: the number of failing tests reduced from 18 to 4 (actually only 2). These two MTC tests that remain failing are both related to another shutdown issue:

cannot publish data, at ./src/rmw_publish.cpp:66 during '__function__'
Fail in delete datareader, at ./src/rmw_service.cpp:89 during '__function__'

Interestingly, Humble does not exhibit that issue. I'm afraid that's a regression in Rolling's rclcpp. No idea, where this originates from. And I need to turn to different tasks next... So, this stays unresolved for now. Any help is welcome ;-)

henningkayser commented 3 months ago

Once this succeeds, please merge with REBASE so that we keep the commit history. There are commits like dac7638 that should be reverted later on.

henningkayser commented 3 months ago

Build all succeed now, we still get a bunch of segfaults in the unit test. In particular there is a double free in PILZ when testing under noble

    [unittest_trajectory_generator_ptp-1] [       OK ] TrajectoryGeneratorPTPTest.testJointGoalNoStartVel (9 ms)
    [unittest_trajectory_generator_ptp-1] [----------] 13 tests from TrajectoryGeneratorPTPTest (140 ms total)
    [unittest_trajectory_generator_ptp-1] 
    [unittest_trajectory_generator_ptp-1] [----------] Global test environment tear-down
    [unittest_trajectory_generator_ptp-1] [==========] 13 tests from 1 test suite ran. (140 ms total)
    [unittest_trajectory_generator_ptp-1] [  PASSED  ] 13 tests.
    [unittest_trajectory_generator_ptp-1] double free or corruption (fasttop)
Error: ROR] [unittest_trajectory_generator_ptp-1]: process has died [pid 35681, exit code -11, cmd '/home/runner/work/moveit2/moveit2/.work/target_ws/install/pilz_industrial_motion_planner/lib/pilz_industrial_motion_planner/unittest_trajectory_generator_ptp --ros-args -r __node:=unittest_trajectory_generator_ptp --params-file /tmp/launch_params_pvabl51l --params-file /tmp/launch_params_sw5qgu_0'].

https://github.com/moveit/moveit2/actions/runs/9402842379/job/25897891774?pr=2851#step:11:11618

rhaschke commented 3 months ago

there is a double free in PILZ when testing under noble

I have fixed that in MoveIt1 and just pushed the corresponding port.

rhaschke commented 3 months ago

Unfortunately, you force-pushed and thus overwrote my commit for the Pilz double-free...

henningkayser commented 3 months ago

Unfortunately, you force-pushed and thus overwrote my commit for the Pilz double-free...

Yeah, I just noticed this too late... Thank you, this helps a lot!

henningkayser commented 3 months ago

I don't think 48f60c4 had anything to do with the double free, unfortunately. The same issue has been fixed a while ago in #1229 already. Oddly, the tests produce segfaults (in CI and also locally) with the compile guard, and since there is no active distro that still provides kdl <1.4.0 I'll remove this switch. All pilz tests pass locally for me without any issues on clang+jazzy with both of the two latest commits. However, the main issue left is that CI will still install the ros-octomap package, regardless if it's provided in the docker image or not. I don't think there is a way around a proper version guard in the cmake setup for geometric_shapes and moveit

henningkayser commented 3 months ago

well, at least jazzy is happy now. tutorials will require fixing of ros2_kortex #2864 and the rolling jobs fail for spurious clang warnings and codecov mismatches. So far moveit_msgs and geometric_shapes are already released, which also includes the version fix for octomap #2862.

rhaschke commented 3 months ago

rolling with clang-tidy should be happy now as well (there were several new fixes). The only broken job remaining is the coverage one. Note that I already reverted several temporary commits. Please cleanup the commit history before merging.

henningkayser commented 3 months ago

rolling with clang-tidy should be happy now as well (there were several new fixes). The only broken job remaining is the coverage one. Note that I already reverted several temporary commits. Please cleanup the commit history before merging.

Thank you! I was planning to do a fresh cherry-pick of all the things that we keep, and squash it into concise blocks in a fresh branch.

henningkayser commented 3 months ago

I'm switching off ccov for now (https://github.com/moveit/moveit2/issues/2866) and will reopen this PR with a new branch to unblock CI and releases.