Possible bug: executor_entities_collection.hpp::update_entities

ros2 / rclcpp

rclcpp (ROS Client Library for C++)

Apache License 2.0

558 stars 420 forks source link

Possible bug: executor_entities_collection.hpp::update_entities #2632

Open jmachowinski opened 1 month ago

jmachowinski commented 1 month ago

Bug report

https://github.com/ros2/rclcpp/blob/a78d0cbd33b8fe0b4db25c04f7e10017bfca6061/rclcpp/include/rclcpp/executors/executor_entities_collection.hpp#L75-L82

I am confused about the

if (entity) {

in line 78. Would this not lead to the on_removed callback not being called, in race cases, even though the element was removed ? @wjwwood @mjcarroll Is this intended, or a bug ?

Obviously, it's kind of useless, to call the callback with a nullptr, but the behavior is still strange to me. I ran into this issue, as internal structures of my code were not cleaned up, due to the callback not being called.

alsora commented 1 month ago

I'm not sure I understand the problem here.

Currently all executors require that the entity passed to on_removed is not a nullptr, because they need to call a method on it.

jmachowinski commented 1 month ago

The problem is, that the documentation of the method / class does not match the behavior.

It states, one will get a on_removed callback, whenever 'update_to' does not contain the entry. This is not the case, if the weak_ptr became invalid in between.

I noticed this, as I tried to use this class and depended on the behavior, that on_removed is called, to clean up other internals of my executor.

This class is used extensively in the executor. Therefore I was wondering, if there is a place somewhere in the executor, were it also depends on 'on_removed' being called, causing 'rare' bugs ?

alsora commented 1 month ago

Ok, thanks for the explanation. This class was meant to be used only by the executor, where (at least currently) the problem that you mention doesn't apply. Note that the same situation happens with the on_added function (it's called only on valid entities despite what the docs say).

So essentially we have two options:

fix the documentation to state that the functions are called only on valid entities.
move the check about whether an entity is valid outside the entities collector and into the functions provided by the executor.

I don't have a strong opinion, although i probably prefer the first. The first option is more suitable if we want to reduce code complexity in the executor, while the second option is more suitable if we want to make this class more generic and reusable.

In any case, we should do something. If someone feels strongly about one option or the other, I'm ok with it.