Closed Depetrol closed 1 month ago
Could you explain how the race condition occurs, i.e. what is the sequence of events causing it. Also what exactly is this change doing?
Well done on tracking it down BTW!
This was a subtle bug that took quite some time to pinpoint. I suspected that a lock was missing somewhere in the process of schedule - replace token - apply token to action
process, but adding critical sections did not seem to solve the bug.
The origin of this concurrency bug for trigger->tmplt->token->value
is in py_schedule
. py_schedule
calls _lf_initialize_token_with_value(trigger->tmplt, value, 1)
, which calls _lf_get_token(trigger->tmplt)
. In _lf_get_token(trigger->tmplt)
, the trigger->tmplt->token
is replaced with a new token. Later, this trigger->tmplt->token
is retrieved in convert_C_action_to_py
and causes the bug.
Consider the following sequence of events: _lf_get_token
in sets the trigger->tmplt->token->value
to 100, another _lf_get_token
in sets the trigger->tmplt->token->value
to 101, convert_C_action_to_py
gets the trigger->tmplt->token->value
which is 101, causing the concurrency bug. However, adding a critical section for _lf_get_token
will not solve the bug because the bug is in a larger scope.
The solution was to use action->token->value
instead --- the action->token->value
has been correctly maintained without this concurrency bug.
Also huge thanks to @edwardalee and @erlingrj for helping me look into this!
Not sure why the windows tests are failing -- this seems unrelated to the change in this PR since tests are also failing in the main branch.
reactor-c
requires that all tests pass before merging, so we might have to wait for https://github.com/lf-lang/lingua-franca/pull/2424 to me merged.
There is another concurrency bug that is triggered probabilistically with the newly introduced test ConcurrentAction.lf
. Running the test 20 times could trigger the bug quite consistently.
Since the newly introduced test in https://github.com/lf-lang/lingua-franca/pull/2423 has been marked as failing in https://github.com/lf-lang/lingua-franca/pull/2422, the checks are passing and automatic merge was triggered. I'll start another PR for the other concurrency bug.
This PR fixes the unintended action override when two actions are triggered at about the same time in issue #489. The bug was in
convert_C_action_to_py
, the token value in the trigger is suspectable to concurrency overrides. The solution was to use the token value in the action. The companion PR in lingua-franca is https://github.com/lf-lang/lingua-franca/pull/2423