Open Ryanf55 opened 11 months ago
We see this in Nav2's CI periodically as well, as reported on the Slack (@clalancette ) It is pretty concerning to us that we again have new RMW errors appearing for really basic tests. This is a new thing in the last few months
@SteveMacenski @Ryanf55
This issue has been there for a long time? Or just stated failing recently? if you happen know which commit causes this error behavior on Nav2, that would be really helpful.
[ERROR] [1702060366.623059261] [fibonacci_server_node]: Error in shutdown of get_type_description service: Fail in delete datawriter
could be related to Type Description Support
... but not sure.
CC: @iuhilnehc-ynos @Barry-Xu-2018
It’s not deterministic in our CI, so I can’t pinpoint the exact commit. Its been the last few months though.
@SteveMacenski @Ryanf55
This issue has been there for a long time? Or just stated failing recently? if you happen know which commit causes this error behavior on Nav2, that would be really helpful.
[ERROR] [1702060366.623059261] [fibonacci_server_node]: Error in shutdown of get_type_description service: Fail in delete datawriter
could be related to
Type Description Support
... but not sure.CC: @iuhilnehc-ynos @Barry-Xu-2018
I'll try to set up a source build of rmw to run a bisect on when I get home in a few days.
Sorry, I haven't had time to do this yet. It might be easier for me to just debug this with gdb because a bisect involves recompiling ros2, then nav2, which takes a while each iteration (158 packages using --packages-up-to).
Here's the bisect report:
/home/ryan/Dev/ros2_rolling/src/ros2/rmw_dds_common/rmw_dds_common/include/rmw_dds_common/qos.hpp:242:3: error: ‘rosidl_type_hash_t’ has not been declared
If I want to bisect this further, I'll need to start doing the bisect across multiple repos since they are coupled together due to the public API additions. Are there any tools to roll back state across multiple repos, say with vcs? Or, do I manually need to figure out which hashes of each repo correspond to each other?
@SteveMacenski Does NAV2 CI give you any way to look at test regressions per test to see the first date that test started failing? I know cdash
has regression reporting capabilities.
From the CDash blog, it looks like you can look at test failure frequency, but not over time. https://circleci.com/blog/how-to-output-junit-tests-through-circleci-2-0-for-expanded-insights/
Unless CircleCI produces that, we dont have any unique tools
I encountered very similar error, https://github.com/mavlink/mavros/actions/runs/8129217925/job/22234646274 the same code works on Humble, but fails on Iron.
Bug report
Required Info:
Steps to reproduce issue
Follow the NAV2 building from source instructions, with the repo on latest
rolling
hash 113564965f54009686d521902ff3fcc9d101c5b5.If you want to build everything from source, you need to check out ros rolling sources, nav2 sources, and clone this in the workspace:
ros2
branchExpected behavior
Tests pass
Actual behavior
RMW has an internal error failing to delete the datawriter in
test_actions
. I already talked to the NAV2 maintainers (Steve) and he said to file the issue here. He sees it in CI, but he says since I can reproduce it locally, to file it ASAP.Additional information
CPU:
AMD® Ryzen 9 7950x 16-core processor × 32
GPU:NVIDIA GeForce RTX 3070/PCIe/SSE2 / NVIDIA Corporation GA104 [GeForce RTX 3070]