project-oak / oak

Meaningful control of data in distributed systems.
Apache License 2.0
1.32k stars 113 forks source link

Potential inconsistency in Rust Oak Runtime thread handling #760

Closed tiziano88 closed 4 years ago

tiziano88 commented 4 years ago

When calling Runtime::wait_on_channel from a Node

https://github.com/project-oak/oak/blob/70da1899692a7ceb9eb0875fb3ee001645adb04e/oak/server/rust/oak_runtime/src/runtime/mod.rs#L160

the "current thread" is looked up using thread::current

https://github.com/project-oak/oak/blob/70da1899692a7ceb9eb0875fb3ee001645adb04e/oak/server/rust/oak_runtime/src/runtime/mod.rs#L164

and eventually parked, if no message is readily available on the specified channel(s)

https://github.com/project-oak/oak/blob/70da1899692a7ceb9eb0875fb3ee001645adb04e/oak/server/rust/oak_runtime/src/runtime/mod.rs#L206

Then, when calling Runtime::stop, the thread corresponding to the Node instance is unparked

https://github.com/project-oak/oak/blob/70da1899692a7ceb9eb0875fb3ee001645adb04e/oak/server/rust/oak_runtime/src/runtime/mod.rs#L117-L120

I think the assumption here is that this would be the same single thread that is associated with the Node instance, which is the case for the current implementation.

The problem arises if (or rather, when) we introduce Nodes that may spawn additional threads, since it may happen that the call to Runtime::wait_on_channel does not come from the initial thread, and therefore unparking the initial thread would not actually have the desired effect.

I think the correct (future-proof) thing to do would be for Runtime::wait_on_channel to keep track of the actual waiting thread itself, perhaps in a separate field (perhaps of type HashMap<ThreadId, JoinHandle>), which should also be able to deal with the fact that multiple threads from the same node may be waiting at the same time.

@blaxill WDYT?

blaxill commented 4 years ago

Your approach in #761 looks good, alternatively we could add references to all spawned threads to runtime::Node

tiziano88 commented 4 years ago

In #759 I am refactoring nodes to not expose the JoinHandle, and instead impl a trait that has a start and stop method, and they keep track of threads (if any) internally, so I think keeping the spawned threads around would be more complicated then; we could add another method to that trait to return all the threads, but it seems less elegant. Also the list of waiting threads is already maintaned, so we may as well use it directly (as #761 does).