Closed cmraaron closed 1 year ago
Could you explain in depth these changes?
yeah no problem.
The first one, "ensure synchronised access to our predicate", puts the setting of graph_changed_
so that its only ever modified with the lock held. Without this, possibly the publish_micoros_graph
thread swaps out after its released the lock, but before assigning graph_changed_ = false
. Another thread runs during this time calling set_on_change_callback
setting graph_change_ = true
and notifying cv_
while no one is wait
ing on it. publish_microros_graph
thread swaps back in and sets graph_changed_ = false
. Now when that thread comes around to wait again, it will not run immediately despite being notified after the last notification that caused a loop.
I suppose there is no issue here because no work (publishing the graph) is done until after graph_changed_ = false
and the set_on_change_callback
is not executed until after graph is modified. So the more likely scenario is that we could publish the same graph twice in a row. However, there are no synchronisation primitives enforcing this. And graph_changed_
is not atomic, so we have a var being assigned by different threads thats not always done with a locked mutex. So, that is that change.
The other commit "release lock before signalling", is just following guidelines. Its not strictly necessary, and there has been much discussion about the "best practices". But I believe this is the current advice. The idea here is that if the cv
is notified while the lock is still held, the signalled thread may may swap in but it cannot proceed because it cannot obtain the lock. So the change is to release the lock before notifying. If the cv
spuriously allows the thread to execute (which can happen) after we release the lock, but before we notify, its ok as the predicate has been set and will allow the work to continue. We will proceed to then notify a cv
which no one is waiting on, which is ok because the task is already executing. Or we will notify when the work is already completed, which is also ok because now graph_changed_ = false
has been performed under a held mutex.
@mergify backport main galactic foxy
backport main galactic foxy
closes a small window to miss publishing a change in the graph