lf-lang / reactor-rs

Reactor runtime implementation in Rust
MIT License
9 stars 5 forks source link

Problem with tag assignment for asynchronous events #33

Open oowekyala opened 1 year ago

oowekyala commented 1 year ago

There was a bug in the C++ runtime, and it can also happen in Rust theoretically (I wasn't able to reproduce it with an unmodified runtime, it depends on thread interleaving).

Possible faulty execution

C++ fix

In C++ there is a global event queue and a global mutex protecting it. The fix is to put the time reading and the pushing of the event in the same critical section.

Rust

In Rust the event queue is split:

We can assume Sender/Receiver communicate atomically.

Possible solutions for the Rust runtime

Global mutex

We could reproduce the C++ solution by introducing a mutex to guard the receiver and sender. This would however defeat part of the purpose of using channels, which is that we don't need to block the async sender thread when sending something.

Let the scheduler assign tags

Another solution would be to let the scheduler thread assign tags to asynchronous events. There are several possible problems with this:

Mixed solution

We could use the asynchronously assigned tag as long as it is greater than the latest processed tag. If it isn't, then we're in the problematic situation described above. Then, we can do something else:

None of these look super appealing in the general case - maybe it should be selectable

lhstrh commented 1 year ago

I think reassigning the tag of the new event is the only reasonable option. I think we should think of it as a transaction. If the race occurs and the tag of the new event is wrong, we roll back, get a new tag, and attempt inserting it again.

lhstrh commented 1 year ago

Note that whatever tag is obtained for the scheduled physical action is uncertain, anyway.

oowekyala commented 1 year ago

Ok, I'll implement this.

For the record, I could not reproduce the bug without adding a thread::sleep in the middle of the critical section, in the code of the runtime (not of the LF program). I suspect this bug is mostly theoretical...

lhstrh commented 1 year ago

These kinds of bugs are load dependent and might only surface rarely, yet I wouldn't call them theoretical because that wrongfully suggests that they cannot really happen in deployment.