micro-ROS / micro-ROS-Agent

ROS 2 package using Micro XRCE-DDS Agent.
Apache License 2.0
97 stars 51 forks source link

Synchronise predicate #160

Closed cmraaron closed 1 year ago

cmraaron commented 1 year ago

closes a small window to miss publishing a change in the graph

pablogs9 commented 1 year ago

Could you explain in depth these changes?

cmraaron commented 1 year ago

yeah no problem. The first one, "ensure synchronised access to our predicate", puts the setting of graph_changed_ so that its only ever modified with the lock held. Without this, possibly the publish_micoros_graph thread swaps out after its released the lock, but before assigning graph_changed_ = false. Another thread runs during this time calling set_on_change_callback setting graph_change_ = true and notifying cv_ while no one is waiting on it. publish_microros_graph thread swaps back in and sets graph_changed_ = false. Now when that thread comes around to wait again, it will not run immediately despite being notified after the last notification that caused a loop.

I suppose there is no issue here because no work (publishing the graph) is done until after graph_changed_ = false and the set_on_change_callback is not executed until after graph is modified. So the more likely scenario is that we could publish the same graph twice in a row. However, there are no synchronisation primitives enforcing this. And graph_changed_ is not atomic, so we have a var being assigned by different threads thats not always done with a locked mutex. So, that is that change.

The other commit "release lock before signalling", is just following guidelines. Its not strictly necessary, and there has been much discussion about the "best practices". But I believe this is the current advice. The idea here is that if the cv is notified while the lock is still held, the signalled thread may may swap in but it cannot proceed because it cannot obtain the lock. So the change is to release the lock before notifying. If the cv spuriously allows the thread to execute (which can happen) after we release the lock, but before we notify, its ok as the predicate has been set and will allow the work to continue. We will proceed to then notify a cv which no one is waiting on, which is ok because the task is already executing. Or we will notify when the work is already completed, which is also ok because now graph_changed_ = false has been performed under a held mutex.

Acuadros95 commented 1 year ago

@mergify backport main galactic foxy

mergify[bot] commented 1 year ago

backport main galactic foxy

✅ Backports have been created

* [#165 Synchronise predicate (backport #160)](https://github.com/micro-ROS/micro-ROS-Agent/pull/165) has been created for branch `main` * [#166 Synchronise predicate (backport #160)](https://github.com/micro-ROS/micro-ROS-Agent/pull/166) has been created for branch `galactic` * [#167 Synchronise predicate (backport #160)](https://github.com/micro-ROS/micro-ROS-Agent/pull/167) has been created for branch `foxy`