Open janpaulpl opened 1 year ago
OK, sounds good! One thing I wasn't quite able to glean here is exactly what the input/output formats are. You briefly mention:
Implementing the static analysis requires writing a module in the Harmony compiler's backend, which serves as a buffer between the compiler and the model checker. The static analysis algorithm works on the Kripke-like structure that's outputted by the compiler and uses a JSON of metadata with source-level information for modifying the structure.
So is the transformation happening on this (in-memory) data structure?
OK, sounds good! One thing I wasn't quite able to glean here is exactly what the input/output formats are. You briefly mention:
Implementing the static analysis requires writing a module in the Harmony compiler's backend, which serves as a buffer between the compiler and the model checker. The static analysis algorithm works on the Kripke-like structure that's outputted by the compiler and uses a JSON of metadata with source-level information for modifying the structure.
So is the transformation happening on this (in-memory) data structure?
Yeah, sorry if it was a bit obfuscated. I mean to say that the manipulation will work in a similar way to how our current programs work on the Bril IR in JSON form. I have to wrap the information from the compiler and put it in a JSON, and then modify that JSON, and feed that into the model checker. The transformation is this intermediary JSON representation of the AST looking structure.
Got it! And this JSON representation will be your own invention? (That is, the toolchain doesn't already have a built-in way to serialize its program representation to disk?)
Got it! And this JSON representation will be your own invention? (That is, the toolchain doesn't already have a built-in way to serialize its program representation to disk?)
Yes! There isn't a nicer way I can interact with the information without the actual structures that are being manipulated internally.
Sounds good!
What will you do?
We'll be optimizing the model checker, and consequentially the compiler, for the Harmony Concurrent Programming Language. In particular, Harmony does not do much in terms of compiler optimizations (although it does evaluate constant folding for simple expressions in the compiler rather than the model checker.) Due to so much heavy lifting from the model checker, state space explosion is unfortunately relatively common in Harmony.
By using domain-specific static analysis information from concurrent programs (such as field accesses), we can create a bridge between the compiler's initial output and the model checker's Kripke structure instance to provide beneficial static information which can reduce the state space, i.e. minimize the Kripke structure. This will allow for the verification of more sophisticated Harmony programs, without the need for better hardware. Particularly beneficial for Harmony, as its main use case is pedagogical concurrent programming, which can help students explore larger programs with their limited local resources.
How will you do it?
To begin with, there are a couple of resources we have to familiarize ourselves with. Thankfully, all resources regarding the Harmony model checker and compiler are open-source and well-documented. This will help us understand Harmony's architecture and locations in which we can carry static metadata from the front end to the back end (and eventually the model checker) without incurring additional compilation overhead.
For understanding how the domain-specific model checker works, the Handbook of Model Checking has a great chapter on Model Checking for concurrent programs (and it's very concise with just about ~20 relevant pages). An obvious question still lies, what static information can we make use of in concurrent programs? Thankfully, yet again, this research question has been answered in a 2014 paper on Model checking of concurrent programs with static analysis of field accesses.
The main algorithm statically detects for each point in the code of each thread the set of fields that may be accessed after that point. A more sophisticated result is called immutable field analysis (IFA), in which they eliminate additional unnecessary thread scheduling choices during the state space traversal.
Implementing IFA in Harmony has two checkpoints:
This means that we can scope a first pass of the analysis to just work on Harmony's thread interleaving and ignore choose expressions and interrupts (this is a closer analysis to that presented in the paper). Implementing the static analysis requires writing a module in the Harmony compiler's backend, which serves as a buffer between the compiler and the model checker. The static analysis algorithm works on the Kripke-like structure that's outputted by the compiler and uses a JSON of metadata with source-level information for modifying the structure.
These modifications include annotating the objects for which there is no reference that an object exists in the heap yet, then only the thread that created the object can access it, and mapping them onto the threads in the final structure.
What are some possible challenges?
This project requires a lot of reading and core implementation. We might not be able to provide any meaningful optimizations with regard to sound implementation of the immutable field analysis in Harmony. The core work will therefore focus on the compiler, and correctly annotating the JSON with the objects-threads mapping for which redundancies are found.
How will you empirically measure success? Ideally, a full implementation would allow for running a set of benchmarks in the Harmony repository while measuring their timeout rate using the following my local hardware (laptop) and UGC Linux.
However, to make sure we have something of value, the first checkpoint will require testing for soundness (not optimization necessarily.) The annotated JSON files can be compared with atomic files as the model checker would normally work, and make use of the existing and working framework for smaller programs that succeed without this optimization (using smaller programs with a maximum of $2^3$ threads).
Team members: @jpramos-me for now.