Closed ycheng517 closed 1 year ago
Have you tried doing a completely clean build? I have experienced similar issues. We saw these issues when debugging differences between cycleone_dds
and fastrtps
. We weren't able to track down a root cause, but our issue (incorrectly populated bools) was resolved by doing a clean build.
I've tried doing a clean build and it doesn't seem to make a difference. For now we're working around this issue by putting the boolean field in front of the float field in Inner.msg. I think this issue happens when mixing floats with ints as well.
I was able to reproduce everything here. Details:
Using ROS2 Iron on Ubuntu 22.04. Versions of FastRTPS involved:
ii ros-iron-fastrtps 2.10.2-1jammy.20230821.084711 amd64 *eprosima Fast DDS* (formerly Fast RTPS) is a C++ implementation of the DDS (Data Distribution Service) standard of the OMG (Object Management Group).
ii ros-iron-fastrtps-cmake-module 3.0.1-1jammy.20230713.195955 amd64 Provide CMake module to find eProsima FastRTPS.
ii ros-iron-rmw-fastrtps-cpp 7.1.1-2jammy.20230821.091204 amd64 Implement the ROS middleware interface using eProsima FastRTPS static code generation in C++.
ii ros-iron-rmw-fastrtps-shared-cpp 7.1.1-2jammy.20230821.090849 amd64 Code shared on static and dynamic type support of rmw_fastrtps_cpp.
ii ros-iron-rosidl-dynamic-typesupport-fastrtps 0.0.2-2jammy.20230821.090716 amd64 FastDDS serialization support implementation for use with C/C++.
ii ros-iron-rosidl-typesupport-fastrtps-c 3.0.1-1jammy.20230713.202119 amd64 Generate the C interfaces for eProsima FastRTPS.
ii ros-iron-rosidl-typesupport-fastrtps-cpp 3.0.1-1jammy.20230713.201943 amd64 Generate the C++ interfaces for eProsima FastRTPS.
We also tried the following changes:
None of these changed the erroneous behavior. As observed by @ycheng517 , Cyclone did not exhibit the issue.
This is extremely concerning.
I'm also able to observe this behavior, when the publisher and subscriber are in the same process. I was able to reproduce on both Iron (Fast-DDS 2.10.x) and Rolling (Fast-DDS 2.11.x). I'll note that Humble (which uses Fast-DDS 2.6.6) does not show the issue.
- Splitting the pub and sub up into different nodes
Were the different nodes within the same process? In my testing, if I used different nodes in different processes then I did not see the issue (in particular, I used ros2 topic echo /topic
, and it showed true
for all fields). This suggests that maybe the problem is in the shared memory implementation in Fast-DDS, but that's just a guess at this point.
@MiguelCompany @EduPonz Can you take a look?
@clalancette my testing of separate nodes was done using separate processes. I did not test between two separate machines though, so I may be seeing an issue with the shared memory implementation, as you have suggested. I can try and disable shmem and test again to see if that solves it.
@clalancette my testing of separate nodes was done using separate processes. I did not test between two separate machines though
Hm, I did not test on separate machines either. It was enough to just have separate processes on the same machine for me. I wonder where that difference comes from. Can you test with ros2 topic echo /topic
as well?
Just tested, and as you have noted, 'ros2 topic echo' produces the correct results. I also tried mixing trues and falses in the fields, and incrementing the double value, and the results held - echoing produced correct results, and the standalone subscriber produced incorrect results (inner values were correct, outer bools were all false regardless of input).
Hi all.
This seems quite similar to an issue we had on the code generator for Fast DDS, which was solved here. Further explanation of what is happening can be found here and here.
In short, the generated code for the message is wrongly considered plain. I have just opened https://github.com/ros2/rosidl_typesupport_fastrtps/pull/108, which shall address this issue in the same manner we have done it on Fast DDS Gen.
@ycheng517 @spiderkeys @clalancette Could you check on your side?
@ycheng517 @spiderkeys @clalancette Could you check on your side?
@MiguelCompany This seems to fix the problem for me using rmw_fastrtps_cpp
, thanks for that.
However, I started to put together a regression test for this, and I'm finding this is still failing with rmw_fastrtps_dynamic_cpp
. Am I right to assume we need a similar patch to https://github.com/ros2/rmw_fastrtps/tree/rolling/rmw_fastrtps_dynamic_cpp to fix that?
Am I right to assume we need a similar patch to https://github.com/ros2/rmw_fastrtps/tree/rolling/rmw_fastrtps_dynamic_cpp to fix that?
Most probably. I'll see what I can do.
I think I solved it in https://github.com/ros2/rmw_fastrtps/pull/716 , but please take a look and let me know.
Also see https://github.com/ros2/system_tests/pull/530, which is an integration test for this problem.
@clalancette @MiguelCompany
Can we do backport all these PRs to Iron and Humble? (i think we should)
@fujitatomoya I do think so!
Can we do backport all these PRs to Iron and Humble? (i think we should)
Yes, we should do it. I was going to let it sit in Rolling for a while before doing that, though.
@clalancette okay i will do that in next week.
@clalancette okay i will do that in next week.
Thank you so much to @spiderkeys, @MiguelCompany, @clalancette and @fujitatomoya for fixing this issue.
What is the status on getting this backported into Iron? We have encountered this bug in our production code and would greatly benefit from not needing to build these packages from source!
@asetapen thank you for the ping, i will be on it.
@asetapen @ycheng517 backport to humble and iron has been completed. i will go ahead to close this one.
@clalancette thanks for taking care of the backports and CI.
Bug report
Required Info:
ros-iron-fastrtps 2.10.2-1jammy.20230908.160451 amd64 ros-iron-fastrtps-cmake-module 3.0.1-1jammy.20230908.160543 amd64 ros-iron-rmw-fastrtps-cpp 7.1.1-2jammy.20230908.165835 amd64 ros-iron-rmw-fastrtps-dynamic-cpp 7.1.1-2jammy.20230908.165838 amd64 ros-iron-rmw-fastrtps-shared-cpp 7.1.1-2jammy.20230908.165115 amd64 ros-iron-rosidl-dynamic-typesupport-fastrtps 0.0.2-2jammy.20230908.162712 amd64 ros-iron-rosidl-typesupport-fastrtps-c 3.0.1-1jammy.20230908.162548 amd64 ros-iron-rosidl-typesupport-fastrtps-cpp 3.0.1-1jammy.20230908.162358 amd64
Steps to reproduce issue
The bug can be reproduced by having a rclcpp node publish and subscribe to a custom message that has some bools as well as a nested message that has floats and bools
Outer.msg
Inner.msg
Simple node that publishes and subscribes to the same topic that uses Outer.msg.
Expected behavior
Received message should be the same as the published message.
Actual behavior
Received message have wrong values for boolean fields. This is the output that I get:
Additional information
The bug seems to only happen with fastrtps. With
RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
things work fine.