In the current code, there is a race condition in rmw_wait. This happens because of the following sequence:
Without taking a lock, we check if any of the entities in the wait_set are ready, and if not attach the condition_variable to it.
Then we take the lock, and go to sleep on the condition_variable.
However, doing step 1 takes time, and we check in a particular order: guard_conditions, events, subscriptions, services, clients. The race happens because a subscription may come in after we've checked subscriptions in the above list (while we are checking services and clients). In that case, we'll unnecessarily go to sleep on the condition_variable, even though something is ready. This increases the latency.
To solve the issue, we still do all of the checking and attaching unlocked. However, we update the "notify" method of each entity so that it takes the condition_variable lock, and in addition to kicking the condition_variable it sets a boolean in the wait_set structure to "true". Then, in rmw_wait(), after we have finished attaching, we take the lock, and set a predicate on the condition_variable so that it will quit if wait_set_data->triggered is true. Thus, if one of the entities became ready after we checked, we'll notice it and not go to sleep.
This also allows us to deal with "spurious" wakeups, where the condition_variable was woken up even though nothing in this wait_set became ready.
In the current code, there is a race condition in rmw_wait. This happens because of the following sequence:
However, doing step 1 takes time, and we check in a particular order: guard_conditions, events, subscriptions, services, clients. The race happens because a subscription may come in after we've checked subscriptions in the above list (while we are checking services and clients). In that case, we'll unnecessarily go to sleep on the condition_variable, even though something is ready. This increases the latency.
To solve the issue, we still do all of the checking and attaching unlocked. However, we update the "notify" method of each entity so that it takes the condition_variable lock, and in addition to kicking the condition_variable it sets a boolean in the wait_set structure to "true". Then, in rmw_wait(), after we have finished attaching, we take the lock, and set a predicate on the condition_variable so that it will quit if wait_set_data->triggered is true. Thus, if one of the entities became ready after we checked, we'll notice it and not go to sleep.
This also allows us to deal with "spurious" wakeups, where the condition_variable was woken up even though nothing in this wait_set became ready.