ros2 / rmw_cyclonedds

ROS 2 RMW layer for Eclipse Cyclone DDS
Apache License 2.0
108 stars 89 forks source link

Fix freeing of loaned chunks #365

Closed sumanth-nirmal closed 2 years ago

sumanth-nirmal commented 2 years ago

Fixes #364

sumanth-nirmal commented 2 years ago

@eboasson @MatthiasKillat Can you please review the changes?

Currently, I am depending on iceoryx directly to free the chunk in serdata_free under the assumption that we can remove this dependency once we update RMW to use the latest Cyclone tag. Please let me know if it doesn't make sense.

sumanth-nirmal commented 2 years ago

@eboasson Thanks for the comment, I did a code walkthrough for memory audit in the shared memory case and fixed a couple of memory leaks. I tried to summarize below the data path with allocations to identify if we missed anything

Publishing

rmw_publish_serialized_message
  1. Construct the serdata from the serialized message (serdata allocation)
  2. With dds_data_allocator we loan the chunk from iceoryx (Chunk allocation)
  3. Deserialize the sample to the loaned chunk (this additional deserialization will be fixed when we update rmw to use the latest Cyclone)
  4. Update the serdata with the loaned chunk
  5. Hand over the serdata to the dds_writecdr()
    1. The loaned chunk is handover to the iceoryx (Chunk free)
    2. The chunk pointer is reset to null in serdata
    3. The serdata ref count is decremented
      1. Since the ref count will be 0, the serdata will be deleted in serdata_rmw_free (serdata free)
rmw_publish_loaned_message
  1. Construct the serdata with the placement new operator (serdata allocation)
  2. Update the serdata with the loaned chunk (the chunk is loaned with rmw_borrow_loaned_message) (Chunk allocation)
  3. Hand over the serdata to the dds_writecdr()
    1. The loaned chunk is handover to the iceoryx (Chunk free)
    2. The chunk pointer is reset to null in serdata
    3. The serdata ref count is decremented
      1. Since the ref count will be 0, the serdata will be deleted in serdata_rmw_free (serdata free)

Subscription

rmw_take_serialized_message/with_info
  1. Get serdata with dds_takecdr() (serdata allocation)
  2. The serdata has valid iox_chunk (updated in serdata_rmw_from_iox) (Chunk allocation)
  3. Serialize the sample from the loaned chunk (this additional serialization will be fixed when we update rmw to use the latest Cyclone)
  4. The serdata ref count is decremented (the ref count will be 0 after this)
    1. The iox_chunk will be released to the iox_subscriber in serdata_rmw_free (Chunk free)
    2. The serdata will be freed in serdata_rmw_free (serdata free)
rmw_take_loaned_message/with_info
  1. Get serdata with dds_takecdr() (serdata allocation)
  2. If the serdata has valid iox_chunk (updated in serdata_rmw_from_iox) (Chunk allocation)
    1. Give the reference to the chunk to the user
    2. Initialize the data_alloc with this subscription
    3. The chunk will be freed in rmw_return_loaned_message_from_subscription() (Chunk free)
    4. The iox_chunk pointer is reset to null and serdata ref count is decremented (the ref count will be 0 after this)
      1. The serdata will be deleted in serdata_rmw_free (serdata free) \ (the loaned chunk will not be released here as the iox_chunk pointer is null)
  3. If the serdata doesn't have the chunk (meaning SHM is disabled on the publisher but enabled on subscription)
    1. Allocate memory for the sample on the heap (Sample allocation)
    2. deserialize the sample to this allocated memory
    3. Give the reference to this allocated memory to the user
    4. The allocated memory will be freed in rmw_return_loaned_message_from_subscription() (Sample free)
    5. The serdata ref count is decremented (the ref count will be 0 after this)
      1. The serdata will be deleted in serdata_rmw_free (serdata free)

I think with this MR, all the previously unhandled memory leaks with SHM should have been fixed (unless I missed anything)!

sumanth-nirmal commented 2 years ago

@eboasson I see that linters fail in the CI jobs here https://github.com/ros2/rmw_cyclonedds/pull/365#pullrequestreview-871457535, but then realized that these are already fixed in this PR https://github.com/ros2/rmw_cyclonedds/pull/363. I have now rebased this branch onto the latest master and the CI should hopefully be good now. Can you please run the CI again?

eboasson commented 2 years ago

Hi @sumanth-nirmal, I've re-run the full CI:

The Windows failure is "tf2_ros_py.pytest.missing_result" not producing a test result, which I'm pretty sure is unrelated. So I think it it is good to go.

eboasson commented 2 years ago

@clalancette I notice that GitHub says your review was requested and I don't to go ahead and merge it if there's a possibility that you're reviewing it at the same time. As far as I am concerned, this is good to go. Please let me know (or simply press "merge" as a way of letting me know 🙂)