ros2 / rclcpp

rclcpp (ROS Client Library for C++)
Apache License 2.0
536 stars 417 forks source link

Revert "call shutdown in LifecycleNode dtor to avoid leaving the devi… #2522

Closed clalancette closed 4 months ago

clalancette commented 4 months ago

…ce in un… (#2450)"

This reverts commit 04ea0bb00293387791522590b7347a2282cda290.

Ever since this commit was merged, tests have been (quietly) failing in CI. You can see this in any of the CI jobs since April 7th, for instance in https://ci.ros2.org/view/nightly/job/nightly_linux_release/3068/:

test 7
      Start  7: test_lifecycle_publisher

7: Test command: /usr/bin/python3 "-u" "/home/jenkins-agent/workspace/nightly_linux_release/ws/install/ament_cmake_test/share/ament_cmake_test/cmake/run_test.py" "/home/jenkins-agent/workspace/nightly_linux_release/ws/build/rclcpp_lifecycle/test_results/rclcpp_lifecycle/test_lifecycle_publisher.gtest.xml" "--package-name" "rclcpp_lifecycle" "--output-file" "/home/jenkins-agent/workspace/nightly_linux_release/ws/build/rclcpp_lifecycle/ament_cmake_gtest/test_lifecycle_publisher.txt" "--command" "/home/jenkins-agent/workspace/nightly_linux_release/ws/build/rclcpp_lifecycle/test_lifecycle_publisher" "--gtest_output=xml:/home/jenkins-agent/workspace/nightly_linux_release/ws/build/rclcpp_lifecycle/test_results/rclcpp_lifecycle/test_lifecycle_publisher.gtest.xml"
7: Working Directory: /home/jenkins-agent/workspace/nightly_linux_release/ws/build/rclcpp_lifecycle
7: Test timeout computed to be: 60
7: -- run_test.py: invoking following command in '/home/jenkins-agent/workspace/nightly_linux_release/ws/build/rclcpp_lifecycle':
7:  - /home/jenkins-agent/workspace/nightly_linux_release/ws/build/rclcpp_lifecycle/test_lifecycle_publisher --gtest_output=xml:/home/jenkins-agent/workspace/nightly_linux_release/ws/build/rclcpp_lifecycle/test_results/rclcpp_lifecycle/test_lifecycle_publisher.gtest.xml
7: Running main() from /home/jenkins-agent/workspace/nightly_linux_release/ws/install/gtest_vendor/src/gtest_vendor/src/gtest_main.cc
7: [==========] Running 4 tests from 1 test suite.
7: [----------] Global test environment set-up.
7: [----------] 4 tests from PerTimerType/TestLifecyclePublisher
7: [ RUN      ] PerTimerType/TestLifecyclePublisher.publish_managed_by_node/wall_timer
7: [WARN] [1715154999.943244122] [LifecyclePublisher]: Trying to publish message on the topic '/topic', but the publisher is not activated
7: [ERROR] [1715154999.943921662] [node]: Unable to start transition 6 from current state shuttingdown: Could not publish transition: publisher's context is invalid, at /home/jenkins-agent/workspace/nightly_linux_release/ws/src/ros2/rcl/rcl/src/rcl/publisher.c:423, at /home/jenkins-agent/workspace/nightly_linux_release/ws/src/ros2/rcl/rcl_lifecycle/src/rcl_lifecycle.c:368
7: [WARN] [1715154999.943982879] [rclcpp_lifecycle]: Shutdown error in destruction of LifecycleNode: final state(inactive)
7: [       OK ] PerTimerType/TestLifecyclePublisher.publish_managed_by_node/wall_timer (35 ms)
7: [ RUN      ] PerTimerType/TestLifecyclePublisher.publish_managed_by_node/generic_timer
7: [WARN] [1715154999.955733453] [LifecyclePublisher]: Trying to publish message on the topic '/topic', but the publisher is not activated
7: [ERROR] [1715154999.956197503] [node]: Unable to start transition 6 from current state shuttingdown: Could not publish transition: publisher's context is invalid, at /home/jenkins-agent/workspace/nightly_linux_release/ws/src/ros2/rcl/rcl/src/rcl/publisher.c:423, at /home/jenkins-agent/workspace/nightly_linux_release/ws/src/ros2/rcl/rcl_lifecycle/src/rcl_lifecycle.c:368
7: [WARN] [1715154999.956255396] [rclcpp_lifecycle]: Shutdown error in destruction of LifecycleNode: final state(inactive)
7: [       OK ] PerTimerType/TestLifecyclePublisher.publish_managed_by_node/generic_timer (11 ms)
7: [ RUN      ] PerTimerType/TestLifecyclePublisher.publish/wall_timer
7: [WARN] [1715154999.965462012] [LifecyclePublisher]: Trying to publish message on the topic '/topic', but the publisher is not activated
7: [ERROR] [1715154999.967268333] [node]: Unable to start transition 5 from current state shuttingdown: Could not publish transition: publisher's context is invalid, at /home/jenkins-agent/workspace/nightly_linux_release/ws/src/ros2/rcl/rcl/src/rcl/publisher.c:423, at /home/jenkins-agent/workspace/nightly_linux_release/ws/src/ros2/rcl/rcl_lifecycle/src/rcl_lifecycle.c:368
7: [WARN] [1715154999.967322508] [rclcpp_lifecycle]: Shutdown error in destruction of LifecycleNode: final state(unconfigured)
7: [       OK ] PerTimerType/TestLifecyclePublisher.publish/wall_timer (12 ms)
7: [ RUN      ] PerTimerType/TestLifecyclePublisher.publish/generic_timer
7: [WARN] [1715154999.977546208] [LifecyclePublisher]: Trying to publish message on the topic '/topic', but the publisher is not activated
7: [ERROR] [1715154999.977927756] [node]: Unable to start transition 5 from current state shuttingdown: Could not publish transition: publisher's context is invalid, at /home/jenkins-agent/workspace/nightly_linux_release/ws/src/ros2/rcl/rcl/src/rcl/publisher.c:423, at /home/jenkins-agent/workspace/nightly_linux_release/ws/src/ros2/rcl/rcl_lifecycle/src/rcl_lifecycle.c:368
7: [WARN] [1715154999.977936306] [rclcpp_lifecycle]: Shutdown error in destruction of LifecycleNode: final state(unconfigured)
7: [       OK ] PerTimerType/TestLifecyclePublisher.publish/generic_timer (7 ms)
7: [----------] 4 tests from PerTimerType/TestLifecyclePublisher (67 ms total)
7: 
7: [----------] Global test environment tear-down
7: [==========] 4 tests from 1 test suite ran. (67 ms total)
7: [  PASSED  ] 4 tests.

What that should look like (and does after this revert) is:

test 7
    Start 7: test_lifecycle_publisher

7: Test command: /usr/bin/python3 "-u" "/home/ubuntu/rolling_ws/install/ament_cmake_test/share/ament_cmake_test/cmake/run_test.py" "/home/ubuntu/rolling_ws/build/rclcpp_lifecycle/test_results/rclcpp_lifecycle/test_lifecycle_publisher.gtest.xml" "--package-name" "rclcpp_lifecycle" "--output-file" "/home/ubuntu/rolling_ws/build/rclcpp_lifecycle/ament_cmake_gtest/test_lifecycle_publisher.txt" "--command" "/home/ubuntu/rolling_ws/build/rclcpp_lifecycle/test_lifecycle_publisher" "--gtest_output=xml:/home/ubuntu/rolling_ws/build/rclcpp_lifecycle/test_results/rclcpp_lifecycle/test_lifecycle_publisher.gtest.xml"
7: Working Directory: /home/ubuntu/rolling_ws/build/rclcpp_lifecycle
7: Test timeout computed to be: 60
7: -- run_test.py: invoking following command in '/home/ubuntu/rolling_ws/build/rclcpp_lifecycle':
7:  - /home/ubuntu/rolling_ws/build/rclcpp_lifecycle/test_lifecycle_publisher --gtest_output=xml:/home/ubuntu/rolling_ws/build/rclcpp_lifecycle/test_results/rclcpp_lifecycle/test_lifecycle_publisher.gtest.xml
7: Running main() from /home/ubuntu/rolling_ws/install/gtest_vendor/src/gtest_vendor/src/gtest_main.cc
7: [==========] Running 4 tests from 1 test suite.
7: [----------] Global test environment set-up.
7: [----------] 4 tests from PerTimerType/TestLifecyclePublisher
7: [ RUN      ] PerTimerType/TestLifecyclePublisher.publish_managed_by_node/wall_timer
7: [WARN] [1715204443.848895590] [LifecyclePublisher]: Trying to publish message on the topic '/topic', but the publisher is not activated
7: [       OK ] PerTimerType/TestLifecyclePublisher.publish_managed_by_node/wall_timer (60 ms)
7: [ RUN      ] PerTimerType/TestLifecyclePublisher.publish_managed_by_node/generic_timer
7: [WARN] [1715204443.876967893] [LifecyclePublisher]: Trying to publish message on the topic '/topic', but the publisher is not activated
7: [       OK ] PerTimerType/TestLifecyclePublisher.publish_managed_by_node/generic_timer (24 ms)
7: [ RUN      ] PerTimerType/TestLifecyclePublisher.publish/wall_timer
7: [WARN] [1715204443.900725954] [LifecyclePublisher]: Trying to publish message on the topic '/topic', but the publisher is not activated
7: [       OK ] PerTimerType/TestLifecyclePublisher.publish/wall_timer (23 ms)
7: [ RUN      ] PerTimerType/TestLifecyclePublisher.publish/generic_timer
7: [WARN] [1715204443.923513955] [LifecyclePublisher]: Trying to publish message on the topic '/topic', but the publisher is not activated
7: [       OK ] PerTimerType/TestLifecyclePublisher.publish/generic_timer (23 ms)
7: [----------] 4 tests from PerTimerType/TestLifecyclePublisher (130 ms total)
7: 
7: [----------] Global test environment tear-down
7: [==========] 4 tests from 1 test suite ran. (131 ms total)
7: [  PASSED  ] 4 tests.

I'm not sure of this, but I think the problem is that when this is being destroyed, you can't assume that all of the lower-level things have not yet been destroyed, so you shouldn't actually transition. If that is something we expect to be able to do, then we probably need to take some weak or shared pointers to the infrastructure we need so it isn't destroyed.

Regardless, we don't have that in place right now, so we should revert this change. @fujitatomoya @alsora @Barry-Xu-2018 FYI (we should probably revert this from jazzy, iron and humble as well).

clalancette commented 4 months ago

CI:

fujitatomoya commented 4 months ago

@clalancette sorry for making trouble, lets revert this for now.

actually https://github.com/ros2/rclcpp/issues/2520 found the same error, i think we need to check if that is not in transition state when we call shutdown on dtor.

anyway this is unlikely to break ABI, so reverting would be the best thing to do.

clalancette commented 4 months ago

New CI after https://github.com/ros2/ros2/pull/1560:

clalancette commented 4 months ago

Going ahead with this one, then backporting to jazzy.

clalancette commented 4 months ago

@Mergifyio backport jazzy

mergify[bot] commented 4 months ago

backport jazzy

✅ Backports have been created

* [#2524 Revert "call shutdown in LifecycleNode dtor to avoid leaving the devi… (backport #2522)](https://github.com/ros2/rclcpp/pull/2524) has been created for branch `jazzy`