ros2 / rmw_fastrtps

Implementation of the ROS Middleware (rmw) Interface using eProsima's Fast RTPS.
Apache License 2.0
157 stars 117 forks source link

message data is wrong when nested message contains floats and booleans #715

Closed ycheng517 closed 1 year ago

ycheng517 commented 1 year ago

Bug report

Required Info:

ros-iron-fastrtps 2.10.2-1jammy.20230908.160451 amd64 ros-iron-fastrtps-cmake-module 3.0.1-1jammy.20230908.160543 amd64 ros-iron-rmw-fastrtps-cpp 7.1.1-2jammy.20230908.165835 amd64 ros-iron-rmw-fastrtps-dynamic-cpp 7.1.1-2jammy.20230908.165838 amd64 ros-iron-rmw-fastrtps-shared-cpp 7.1.1-2jammy.20230908.165115 amd64 ros-iron-rosidl-dynamic-typesupport-fastrtps 0.0.2-2jammy.20230908.162712 amd64 ros-iron-rosidl-typesupport-fastrtps-c 3.0.1-1jammy.20230908.162548 amd64 ros-iron-rosidl-typesupport-fastrtps-cpp 3.0.1-1jammy.20230908.162358 amd64

Steps to reproduce issue

The bug can be reproduced by having a rclcpp node publish and subscribe to a custom message that has some bools as well as a nested message that has floats and bools

Outer.msg

nested_msg_pub_sub/Inner inner
bool bool_field_1
bool bool_field_2
bool bool_field_3

Inner.msg

float64 float_field
bool bool_field

Simple node that publishes and subscribes to the same topic that uses Outer.msg.

#include "rclcpp/rclcpp.hpp"
#include "nested_msg_pub_sub/msg/outer.hpp"

using namespace std::chrono_literals;

class MinimalPublisher : public rclcpp::Node
{
public:
  MinimalPublisher()
  : Node("minimal_publisher"), count_(0)
  {
    publisher_ = this->create_publisher<nested_msg_pub_sub::msg::Outer>("topic", 10);
    subscription_ = this->create_subscription<nested_msg_pub_sub::msg::Outer>(
      "topic", 10, std::bind(&MinimalPublisher::topic_callback, this, std::placeholders::_1));
    timer_ = this->create_wall_timer(
      500ms, std::bind(&MinimalPublisher::timer_callback, this));
  }

private:
  void timer_callback()
  {
    auto msg = nested_msg_pub_sub::msg::Outer();
    msg.inner.float_field = 1.23456789;
    msg.inner.bool_field = true;
    msg.bool_field_1 = true;
    msg.bool_field_2 = true;
    msg.bool_field_3 = true;
    std::string msg_str = nested_msg_pub_sub::msg::to_yaml(msg);
    RCLCPP_INFO(this->get_logger(), "Publishing: '%s'", msg_str.c_str());
    publisher_->publish(msg);
  }

  void topic_callback(const nested_msg_pub_sub::msg::Outer & msg) const  // CHANGE
  {
    std::string msg_str = nested_msg_pub_sub::msg::to_yaml(msg);
    RCLCPP_INFO(this->get_logger(), "Received: '%s'", msg_str.c_str());
  }

  rclcpp::Subscription<nested_msg_pub_sub::msg::Outer>::SharedPtr subscription_; 
  rclcpp::TimerBase::SharedPtr timer_;
  rclcpp::Publisher<nested_msg_pub_sub::msg::Outer>::SharedPtr publisher_;
  size_t count_;
};

int main(int argc, char * argv[])
{
  rclcpp::init(argc, argv);
  rclcpp::spin(std::make_shared<MinimalPublisher>());
  rclcpp::shutdown();
  return 0;
}

Expected behavior

Received message should be the same as the published message.

Actual behavior

Received message have wrong values for boolean fields. This is the output that I get:

[minimal_publisher] [1696005743.235449725] [INFO] [timer_callback():32]: Publishing: 'inner:
  float_field: 1.23457
  bool_field: true
bool_field_1: true
bool_field_2: true
bool_field_3: true
'
[minimal_publisher] [1696005743.235686748] [INFO] [topic_callback():39]: Received: 'inner:
  float_field: 1.23457
  bool_field: true
bool_field_1: false
bool_field_2: false
bool_field_3: false
'

Additional information

The bug seems to only happen with fastrtps. With RMW_IMPLEMENTATION=rmw_cyclonedds_cpp things work fine.

laughlinbarker commented 1 year ago

Have you tried doing a completely clean build? I have experienced similar issues. We saw these issues when debugging differences between cycleone_dds and fastrtps. We weren't able to track down a root cause, but our issue (incorrectly populated bools) was resolved by doing a clean build.

ycheng517 commented 1 year ago

I've tried doing a clean build and it doesn't seem to make a difference. For now we're working around this issue by putting the boolean field in front of the float field in Inner.msg. I think this issue happens when mixing floats with ints as well.

spiderkeys commented 1 year ago

I was able to reproduce everything here. Details:

Using ROS2 Iron on Ubuntu 22.04. Versions of FastRTPS involved:

ii  ros-iron-fastrtps                               2.10.2-1jammy.20230821.084711               amd64        *eprosima Fast DDS* (formerly Fast RTPS) is a C++ implementation of the DDS (Data Distribution Service) standard of the OMG (Object Management Group).
ii  ros-iron-fastrtps-cmake-module                  3.0.1-1jammy.20230713.195955                amd64        Provide CMake module to find eProsima FastRTPS.
ii  ros-iron-rmw-fastrtps-cpp                       7.1.1-2jammy.20230821.091204                amd64        Implement the ROS middleware interface using eProsima FastRTPS static code generation in C++.
ii  ros-iron-rmw-fastrtps-shared-cpp                7.1.1-2jammy.20230821.090849                amd64        Code shared on static and dynamic type support of rmw_fastrtps_cpp.
ii  ros-iron-rosidl-dynamic-typesupport-fastrtps    0.0.2-2jammy.20230821.090716                amd64        FastDDS serialization support implementation for use with C/C++.
ii  ros-iron-rosidl-typesupport-fastrtps-c          3.0.1-1jammy.20230713.202119                amd64        Generate the C interfaces for eProsima FastRTPS.
ii  ros-iron-rosidl-typesupport-fastrtps-cpp        3.0.1-1jammy.20230713.201943                amd64        Generate the C++ interfaces for eProsima FastRTPS.

We also tried the following changes:

None of these changed the erroneous behavior. As observed by @ycheng517 , Cyclone did not exhibit the issue.

This is extremely concerning.

clalancette commented 1 year ago

I'm also able to observe this behavior, when the publisher and subscriber are in the same process. I was able to reproduce on both Iron (Fast-DDS 2.10.x) and Rolling (Fast-DDS 2.11.x). I'll note that Humble (which uses Fast-DDS 2.6.6) does not show the issue.

  • Splitting the pub and sub up into different nodes

Were the different nodes within the same process? In my testing, if I used different nodes in different processes then I did not see the issue (in particular, I used ros2 topic echo /topic, and it showed true for all fields). This suggests that maybe the problem is in the shared memory implementation in Fast-DDS, but that's just a guess at this point.

@MiguelCompany @EduPonz Can you take a look?

spiderkeys commented 1 year ago

@clalancette my testing of separate nodes was done using separate processes. I did not test between two separate machines though, so I may be seeing an issue with the shared memory implementation, as you have suggested. I can try and disable shmem and test again to see if that solves it.

clalancette commented 1 year ago

@clalancette my testing of separate nodes was done using separate processes. I did not test between two separate machines though

Hm, I did not test on separate machines either. It was enough to just have separate processes on the same machine for me. I wonder where that difference comes from. Can you test with ros2 topic echo /topic as well?

spiderkeys commented 1 year ago

Just tested, and as you have noted, 'ros2 topic echo' produces the correct results. I also tried mixing trues and falses in the fields, and incrementing the double value, and the results held - echoing produced correct results, and the standalone subscriber produced incorrect results (inner values were correct, outer bools were all false regardless of input).

MiguelCompany commented 1 year ago

Hi all.

This seems quite similar to an issue we had on the code generator for Fast DDS, which was solved here. Further explanation of what is happening can be found here and here.

In short, the generated code for the message is wrongly considered plain. I have just opened https://github.com/ros2/rosidl_typesupport_fastrtps/pull/108, which shall address this issue in the same manner we have done it on Fast DDS Gen.

@ycheng517 @spiderkeys @clalancette Could you check on your side?

clalancette commented 1 year ago

@ycheng517 @spiderkeys @clalancette Could you check on your side?

@MiguelCompany This seems to fix the problem for me using rmw_fastrtps_cpp, thanks for that.

However, I started to put together a regression test for this, and I'm finding this is still failing with rmw_fastrtps_dynamic_cpp. Am I right to assume we need a similar patch to https://github.com/ros2/rmw_fastrtps/tree/rolling/rmw_fastrtps_dynamic_cpp to fix that?

MiguelCompany commented 1 year ago

Am I right to assume we need a similar patch to https://github.com/ros2/rmw_fastrtps/tree/rolling/rmw_fastrtps_dynamic_cpp to fix that?

Most probably. I'll see what I can do.

clalancette commented 1 year ago

I think I solved it in https://github.com/ros2/rmw_fastrtps/pull/716 , but please take a look and let me know.

clalancette commented 1 year ago

Also see https://github.com/ros2/system_tests/pull/530, which is an integration test for this problem.

fujitatomoya commented 1 year ago

@clalancette @MiguelCompany

Can we do backport all these PRs to Iron and Humble? (i think we should)

MiguelCompany commented 1 year ago

@fujitatomoya I do think so!

clalancette commented 1 year ago

Can we do backport all these PRs to Iron and Humble? (i think we should)

Yes, we should do it. I was going to let it sit in Rolling for a while before doing that, though.

fujitatomoya commented 1 year ago

@clalancette okay i will do that in next week.

asetapen commented 1 year ago

@clalancette okay i will do that in next week.

Thank you so much to @spiderkeys, @MiguelCompany, @clalancette and @fujitatomoya for fixing this issue.

What is the status on getting this backported into Iron? We have encountered this bug in our production code and would greatly benefit from not needing to build these packages from source!

fujitatomoya commented 1 year ago

@asetapen thank you for the ping, i will be on it.

fujitatomoya commented 1 year ago

@asetapen @ycheng517 backport to humble and iron has been completed. i will go ahead to close this one.

@clalancette thanks for taking care of the backports and CI.