Open mratsim opened 4 years ago
This RFC is not very action-able so far.
What features does DrNim need in order to support multi-threading?
This RFC covers 2 parts:
DrNim can be expanded to support 1. As mentioned, this would be very helpful for embedded devices, protocol design, mission-critical systems to prove the absence of flaws in the design.
For that DrNim would need to implement some AST -> Z3 translation similar to https://github.com/bmoth-mc/bmoth/tree/develop/src/main/java/de/bmoth/backend/z3 and described in the paper https://journal.ub.tu-berlin.de/eceasst/article/view/1074, Writing a Model Checker in 80 Days: Reusable Libraries and Custom Implementation from Petrash et al
The AST ideally is similar to Synthesis DSL and you can feed it:
This way you can ask Z3 if your state machine doesn't have a corner case with an event that is not handled.
The level beyond is then composing state machines to implement a protocol.
Then we need to find (non-trivial but not too complex) examples to showcase.
This RFC outlines a potential roadmap to have tooling that significantly address a "new" class of bugs, logic bugs, i.e. bugs in the program design.
New as in the only general purpose language that addresses it as far as I know is Ada/Sparks. And others are formal verification language. I am aware of Eiffel and the design-by-contract paradigm, you will find assume, invariant, precondition, postcondition and assert as useful.
This is a followup to my comment on the safety modes RFC: https://github.com/nim-lang/RFCs/issues/180#issuecomment-572689141
Motivating examples
You are designing software for one of the following scenarios:
Overview
The usual software bug detection tools:
They do not address bugs in software design that come from an unlikely sequence of events that trigger an unhandled or undefined scenario and potentially catastrophic failures.
Caveat emptor
This is not a replacement for runtime checks, testing, fuzzing. It ensures that your design (aka blueprint) is correct but your implementation might be buggy (for example you forgot to initialize a variable before use).
Correct-by-construction mission-critical system
In Nim, this would involve a branch of formal verification called model checking. This works by exhaustively checking all the possible state of your program (say no train in tunnel, train approaching, trains approaching, train out of the tunnel, rain, train stuck in tunnel, red light failure, ...) and asserting that the unwanted scenarios cannot happen according to the state transitions of your system and your explicit assumptions (the red light cannot fail).
This requires at minimum:
Paper
This paper, which comes with an implementation, uses:
This was created by students in a trimester so should be quite accessible.
Note that they mentioned that Z3 wasn't ideal due to not mapping perfectly to the B language. In our case, I think it's an excellent first step as:
In Nim
if
andgoto
and it's fast.await
that participare in work-stealing while waiting)Correct-by-construction thread synchronization
This is something I critically need in Weave / Project Picasso (#160). I did use model checking via TLA+ to debug and remove deadlocks in my backoff algorithm (that put idle threads to sleep) and ultimately uncover a wakeup bug in Glibc condition variables (https://github.com/mratsim/weave/issues/56). However, that requires learning a new language (which is worth it if implementing a lot of distributed protocols) and does not address implementation bugs due to misunderstanding the C11/C++11 memory model for atomics synchronization.
Examples
Verifying the implementation of a concurrent channel.
In Nim
The goal is to verify that the implemented Nim code does not lead to an inconsistent state by unanticipated interleaving of thread execution. The way to do that is by overloading
createThread
, and synchronization primitives likeAtomic
,Lock
andCondVar
.To model check a concurrent data structure instead of creating a thread,
createThread
will create two states, and then you create a graph of all possible threads (and resulting states) interleaving. When encountering a synchronization primtives, the model checker might have 1 state make progress (or not) if the memory model allows it (locked or not, acquire/release, ...).This create a (potentially huge graph) of thread interleaving and what happens at each synchronization and exhaustively explore all possible state proving whether or not there is a synchronization bug (via asserts in the code).
References:
And a lot more at https://github.com/mratsim/weave/issues/18 but most only model sequentially consistent execution (i.e. synchronization via locks and condition variables) but not C11/C++11 relaxed memory model and synchroniation via acquire/release atomics).
Note: it's probably possible to use Z3 here as well.