ros2 / rmw_fastrtps

Implementation of the ROS Middleware (rmw) Interface using eProsima's Fast RTPS.
Apache License 2.0
157 stars 117 forks source link

Fix matched event issues #683

Closed MiguelCompany closed 1 year ago

MiguelCompany commented 1 year ago

An alternative for #682 that should fix #681

Barry-Xu-2018 commented 1 year ago

There is a problem on take_event.
e.g.

https://github.com/eProsima/Fast-DDS/blob/master/src/cpp/fastdds/publisher/DataWriterImpl.cpp#L1190-L1208

In get_publication_matched_status(), current_count_change and total_count_change will be set as 0.

eturnCode_t DataWriterImpl::get_publication_matched_status(
        PublicationMatchedStatus& status)
{
    if (writer_ == nullptr)
    {
        return ReturnCode_t::RETCODE_NOT_ENABLED;
    }

    {
        std::unique_lock<RecursiveTimedMutex> lock(writer_->getMutex());

        status = publication_matched_status_;
        publication_matched_status_.current_count_change = 0;
        publication_matched_status_.total_count_change = 0;
    }

    user_datawriter_->get_statuscondition().get_impl()->set_status(StatusMask::publication_matched(), false);
    return ReturnCode_t::RETCODE_OK;
}

While calling take_event after matched event happen, it will call get_publication_matched_status() to get status. At this time, current_count_change and total_count_change are already set as 0.

https://github.com/eProsima/rmw_fastrtps/blob/5918930fe50e6754ddb97619a55eae1b1eb2af80/rmw_fastrtps_shared_cpp/src/custom_publisher_info.cpp#L170-L192

So I think if change may be as below (Only avoid triggering event)

    matched_status_.total_count = total_count;
    matched_status_.total_count_change += total_count_change;
    matched_status_.current_count = current_count;
    matched_status_.current_count_change += current_count_change;

    matched_changes_ = true;
  if (on_new_event_cb_[RMW_EVENT_PUBLICATION_MATCHED]) {
    trigger_event(RMW_EVENT_PUBLICATION_MATCHED);
  }
MiguelCompany commented 1 year ago

@Barry-Xu-2018 I think you are right!

Updated the code in 0c60ec2856ce41a039fafec95319c71e5be4a79a

fujitatomoya commented 1 year ago

CI:

fujitatomoya commented 1 year ago

CC: @iuhilnehc-ynos

fujitatomoya commented 1 year ago

@MiguelCompany can you address cpplint warning?

Barry-Xu-2018 commented 1 year ago

I tried to run matched demo and find a new problem. It seems repeated event was received. I will investigate it.

ros2 run demo_nodes_cpp matched_event_detect 
[INFO] [1680752983.277148039] [multi_sub_node]: Create a new subscription.
[INFO] [1680752983.278800649] [matched_event_detect_node]: First subscription is connected.
[INFO] [1680752983.278846840] [multi_sub_node]: Create a new subscription.
[INFO] [1680752983.280026632] [matched_event_detect_node]: The changed number of connected subscription is 1 and current number of connected subscription is 2.
[INFO] [1680752983.280062182] [multi_sub_node]: Destroy a subscription.
[INFO] [1680752983.281545152] [matched_event_detect_node]: The changed number of connected subscription is -1 and current number of connected subscription is 1.
[INFO] [1680752983.281580592] [multi_sub_node]: Destroy a subscription.
[INFO] [1680752983.282452240] [matched_event_detect_node]: Last subscription is disconnected.
[INFO] [1680752983.282486114] [multi_pub_node]: Create a new publisher.
[INFO] [1680752983.283396538] [multi_pub_node]: Create a new publisher.
[INFO] [1680752983.284131755] [matched_event_detect_node]: First publisher is connected.
[INFO] [1680752983.284165433] [multi_pub_node]: Destroy a publisher.
[INFO] [1680752983.285238736] [matched_event_detect_node]: The changed number of connected publisher is -1 and current number of connected publisher is 1.
[INFO] [1680752983.285273466] [multi_pub_node]: Destroy a publisher.
[INFO] [1680752983.286016443] [matched_event_detect_node]: Last publisher is disconnected.
Barry-Xu-2018 commented 1 year ago

You can use this code to reproduce the problem.
https://github.com/Barry-Xu-2018/bug_reproduce/blob/master/rclcpp_bug/src/rmw_fastrtps_681/rmw_fastrtps_681.cpp

While creating or destroying a subscription, publisher get 2 notification of matched event.

[INFO] [1680787176.004999904] [multi_sub_node]: Create a new subscription.
[INFO] [1680787176.007356863] [matched_event_detect_node]: Matched Event: 1, 1, 1, 1
[INFO] [1680787176.007778429] [matched_event_detect_node]: Matched Event: 1, 0, 1, 0
[INFO] [1680787176.009060380] [multi_sub_node]: Destroy a subscription.
[INFO] [1680787176.010856996] [matched_event_detect_node]: Matched Event: 1, 0, 0, -1
[INFO] [1680787176.011048782] [matched_event_detect_node]: Matched Event: 1, 0, 0, 0

Repeated notification is from

https://github.com/eProsima/rmw_fastrtps/blob/0c60ec2856ce41a039fafec95319c71e5be4a79a/rmw_fastrtps_shared_cpp/src/rmw_wait.cpp#L194-L199

And repeated notification isn't from DataWriterImpl::InnerDataWriterListener::onWriterMatched().
I didn't find the root cause.

fujitatomoya commented 1 year ago

@MiguelCompany still it still generates the 2 events, is this expected behavior? or we should add if *_change is zero, it should be ignored?

MiguelCompany commented 1 year ago

@Barry-Xu-2018 @fujitatomoya I think I finally got it right. Please check on your side.

fujitatomoya commented 1 year ago

CI:

Barry-Xu-2018 commented 1 year ago

@MiguelCompany After testing, the repeat notification also be found.

In __rmw_init_event, you change code as the below.

  bool is_matched_event =
    (RMW_EVENT_PUBLICATION_MATCHED == event_type) ||
    (RMW_EVENT_SUBSCRIPTION_MATCHED == event_type);
  if (!is_matched_event) {
    eprosima::fastdds::dds::StatusMask status_mask =
      event->get_listener()->get_statuscondition().get_enabled_statuses();
    status_mask |= rmw_fastrtps_shared_cpp::internal::rmw_event_to_dds_statusmask(event_type);
    event->get_listener()->get_statuscondition().set_enabled_statuses(status_mask);
  }

According to my understanding, you don't want to get status notification for matched event.
But I found the below code also can run (active is set).

https://github.com/ros2/rmw_fastrtps/blob/master/rmw_fastrtps_shared_cpp/src/rmw_wait.cpp#L191-L199

Barry-Xu-2018 commented 1 year ago

@MiguelCompany

Status mask already enable matched.
e.g. https://github.com/ros2/rmw_fastrtps/blob/851ced88fe7979af847ac0b231e293f9f23a13ae/rmw_fastrtps_cpp/src/publisher.cpp#L268-L272

Maybe we cannot disable it.
So maybe we have to add check at __wait_set(). Status check ignore matched event or matched event only depend on trigger value.

https://github.com/ros2/rmw_fastrtps/blob/851ced88fe7979af847ac0b231e293f9f23a13ae/rmw_fastrtps_shared_cpp/src/rmw_wait.cpp#L191-L205

Barry-Xu-2018 commented 1 year ago

Friendly ping @MiguelCompany

Barry-Xu-2018 commented 1 year ago

@MiguelCompany

According to the release schedule of Iron

Mon. April 10, 2023 - Alpha + RMW freeze Preliminary testing and stabilization of ROS Base 1 packages, and API and feature freeze for RMW provider packages.

I am not sure whether this problem should be fixed today.
Maybe we can use this temporary method to resolve it (https://github.com/ros2/rmw_fastrtps/pull/683#issuecomment-1499865369). If have better way, we can update codes late.

MiguelCompany commented 1 year ago

@Barry-Xu-2018 @fujitatomoya One last try before using the workaround in https://github.com/ros2/rmw_fastrtps/pull/683#issuecomment-1499865369.

I rebased and changed the way the information of the event is taken, so the status on the entity is always reset.

Barry-Xu-2018 commented 1 year ago

@MiguelCompany Thank you. Your change fixed it after testing.
I think you can merge this fix.

iuhilnehc-ynos commented 1 year ago

re-run CI based on https://github.com/ros2/rmw_fastrtps/pull/683#issuecomment-1499540362

iuhilnehc-ynos commented 1 year ago

rerun CI with my repos which use the latest ros2 with current PR.

CI:

MiguelCompany commented 1 year ago

@fujitatomoya @clalancette This bugfix is ready