ros2 / rmw_cyclonedds

ROS 2 RMW layer for Eclipse Cyclone DDS
Apache License 2.0
112 stars 91 forks source link

Shared memory chunks are lost when the subscriber callback is delayed #364

Closed sumanth-nirmal closed 2 years ago

sumanth-nirmal commented 2 years ago

Bug report

Required Info:

Steps to reproduce issue

  1. Set up a pub/sub example with all the required SHM constraints
  2. Make the subscriber callback significantly time-consuming
  3. After some iterations, the shared memory chunks pile up and eventually leading to"out of chunks" error.

Actual behavior

When the subscriber callback is expensive and takes significant time to complete, the shared memory chunks are lost

Expected behavior

The shared memory chunks shouldn't be lost in scenarios where the subscriber callbacks take significant time to execute and the chunks should be freed properly.

Additional information

The probable reasoning for why this happens is as follows, when shared memory is enabled, the flow for executing the subscription is first the loaned message is taken, then the callback is executed with the taken loaned message, then once the callback execution is completed the loaned message is returned back to the middleware. However, if the callback is expensive and takes significant time, then in the mean time there are samples also received in the reader history cache, and based on the history depth the samples in the reader history cache will be lost without being processed by the executor. Ideally, when the sample is removed from the reader history cache, the corresponding loan for this sample should also be returned to the middleware, which is not handled correctly in serdata_rmw_free.

The essence is in serdata_rmw_free the loan is not returned to the middleware if there is an outstanding loan.