ros2 / rmw_fastrtps

Implementation of the ROS Middleware (rmw) Interface using eProsima's Fast RTPS.
Apache License 2.0
157 stars 117 forks source link

Data race on entities listeners #632

Closed mauropasse closed 2 years ago

mauropasse commented 2 years ago

I have identified a data race, produced by the lack of mutual exclusion on entities listeners.

Free listener info:

__rmw_destroy_service() {
    auto info = static_cast<CustomServiceInfo *>(service->data); // [1]
    ...
    delete info; // [2]
}

Then, the same (but could be other) thread on service destruction references memory after it has been freed (thus seg-faults): Update: No other thread could access freed memory, since it would need a living object, in which case the destructor would never have been called (and memory not freed).

// Starts from ~ServiceBase(), which calls clear_on_new_request_callback(), which calls the following:

__rmw_service_set_on_new_request_callback() {
    auto custom_service_info = static_cast<CustomServiceInfo *>(rmw_service->data); // [3]
}

1: https://github.com/ros2/rmw_fastrtps/blob/rolling/rmw_fastrtps_shared_cpp/src/rmw_service.cpp#L58 2: https://github.com/ros2/rmw_fastrtps/blob/rolling/rmw_fastrtps_shared_cpp/src/rmw_service.cpp#L132 3: https://github.com/ros2/rmw_fastrtps/blob/rolling/rmw_fastrtps_shared_cpp/src/rmw_service.cpp#L170

Question: Is CustomServiceInfo * supposed to be thread safe? Maybe is not possible, and this safety has to be addressed in top layers using it.

I'd like to address this issue in the less invasive way, but I'd like suggestions of the correct way to:

For more info, the issue starts here: https://github.com/ros2/rclcpp/blob/3d69031d2a614cf02dc10ea5db153471ea32b1f2/rclcpp/src/rclcpp/service.cpp#L35-L38

In detail:

ServiceBase::~ServiceBase()
{
  // Here, implicitly, the deleter of service_handle_ is called, the one freeing the memory
  // ~service_handle_  (*)

  // Then, the following is called, which tries to access the freed mem:
  clear_on_new_request_callback();
  // Maybe removing directly this call, fixes everything.
}

(*) https://github.com/ros2/rclcpp/blob/3d69031d2a614cf02dc10ea5db153471ea32b1f2/rclcpp/include/rclcpp/service.hpp#L316-L327

So, if we could make sure than in ~ServiceBase() we call clear_on_new_request_callback() before the deleter of service_handle_ we'd be safe. Or the other approach, remove clear_on_new_request_callback() call in destructor, which I'm not sure why is there in 1st place, since it might not be necessary.

mauropasse commented 2 years ago

Closing this after closer examination. rclcpp service destructor shouldn't clear the callbacks (neither the rclcpp nor the rmw_..._cpp). It's useless anyway clearing the callbacks since they will be destroyed right away, and no data race could happen since memory is freed in destructor, which means there are no more owners of the service (same applies for all entities). I'll create a PR in rclcpp to remove all of these clearings. Sorry about the noise.