lf-lang / reactor-c

A reactor runtime written in C
Other
12 stars 24 forks source link

Fix Unintended Action Override #490

Closed Depetrol closed 1 month ago

Depetrol commented 1 month ago

This PR fixes the unintended action override when two actions are triggered at about the same time in issue #489. The bug was in convert_C_action_to_py, the token value in the trigger is suspectable to concurrency overrides. The solution was to use the token value in the action. The companion PR in lingua-franca is https://github.com/lf-lang/lingua-franca/pull/2423

erlingrj commented 1 month ago

Could you explain how the race condition occurs, i.e. what is the sequence of events causing it. Also what exactly is this change doing?

erlingrj commented 1 month ago

Well done on tracking it down BTW!

Depetrol commented 1 month ago

This was a subtle bug that took quite some time to pinpoint. I suspected that a lock was missing somewhere in the process of schedule - replace token - apply token to action process, but adding critical sections did not seem to solve the bug.

The origin of this concurrency bug for trigger->tmplt->token->value is in py_schedule. py_schedule calls _lf_initialize_token_with_value(trigger->tmplt, value, 1), which calls _lf_get_token(trigger->tmplt). In _lf_get_token(trigger->tmplt), the trigger->tmplt->token is replaced with a new token. Later, this trigger->tmplt->token is retrieved in convert_C_action_to_py and causes the bug.

Consider the following sequence of events: _lf_get_token in sets the trigger->tmplt->token->value to 100, another _lf_get_token in sets the trigger->tmplt->token->value to 101, convert_C_action_to_py gets the trigger->tmplt->token->value which is 101, causing the concurrency bug. However, adding a critical section for _lf_get_token will not solve the bug because the bug is in a larger scope.

The solution was to use action->token->value instead --- the action->token->value has been correctly maintained without this concurrency bug.

Depetrol commented 1 month ago

Also huge thanks to @edwardalee and @erlingrj for helping me look into this!

Depetrol commented 1 month ago

Not sure why the windows tests are failing -- this seems unrelated to the change in this PR since tests are also failing in the main branch.

Depetrol commented 1 month ago

reactor-c requires that all tests pass before merging, so we might have to wait for https://github.com/lf-lang/lingua-franca/pull/2424 to me merged.

Depetrol commented 1 month ago

There is another concurrency bug that is triggered probabilistically with the newly introduced test ConcurrentAction.lf. Running the test 20 times could trigger the bug quite consistently.

Depetrol commented 1 month ago

Since the newly introduced test in https://github.com/lf-lang/lingua-franca/pull/2423 has been marked as failing in https://github.com/lf-lang/lingua-franca/pull/2422, the checks are passing and automatic merge was triggered. I'll start another PR for the other concurrency bug.