ros2 / rmw_cyclonedds

ROS 2 RMW layer for Eclipse Cyclone DDS
Apache License 2.0
108 stars 89 forks source link

Potential dangling pointer in event data #430

Open nnmm opened 1 year ago

nnmm commented 1 year ago

In Apex.AI's in-house rmw implementation, we recently discovered an issue that might be affecting rmw_cyclonedds as well.

When the node is spinning in an executor, the executor, via the memory strategy and callback group, obtains a shared ptr to the subscriptions, events, etc. that are part of the node. So, these things will never be destroyed while they're in the executor's wait set. However, publishers are not kept alive in the same way, so they can disappear at any time. And the event may reference the publisher at the rmw layer: the rmw_event->data pointer is a CddsPublisher * which is exclusively owned by the publisher, i.e. destroyed when the publisher is destroyed.

So if I'm not mistaken, there is a potential dangling pointer when the event accesses its data pointer after the publisher has disappeared, and freed it.

Sorry for not providing a reproducer, as I am not an actual user of rmw_cyclonedds it would take considerable effort to set one up. I'm just opening this issue to give notice that you might be affected by the same issue. I think a minimal reproducer would

  1. Create a publisher
  2. Spin the node in an executor (thread 1)
  3. Destroy the publisher (thread 2)
  4. Receive an event while still spinning in thread 1 (could be an incompatible QOS event)

We observed it first in the dynamic_bridge executable from ros1_bridge.