Open nikomatsakis opened 6 years ago
I've been debating about this. I think my ideal would be that we take in MIR and generate facts, but that would require reproducing a lot of the compiler. Obviously the simplest format is the one we already have: just give me the raw facts. So the question is, which facts should we "derive" from a nicer input and which can the user manually specify?
It seems to me that by far the most tedious to derive are cfg_edge
and region_live_at
. So pehaps we should focus on those, and have the user manually specify the borrow_region
, outlives
, and invalidates
themselves. We can then (over time) transition to the rest.
I envision something like this. First, a program consists of a universal_regions
declaration and a set of basic blocks:
universal_regions { 'a, 'b, 'c }
block B0 {
statement0; // series of `;`-separated statements, defined below
statement1;
goto B1, B2, B3; // create edges between basic blocks
}
This is kind of a grammar like this (using a LALRPOP-like format):
Input = UniversalRegions BlockDefn*;
UniversalRegions = "universal_regions" "{" Comma<Region> "}";
BlockDefn = "block" Block "{" Statement* Goto? "}"
Goto = "goto" Comma<Block> ";"
Each statement would be a comma-separate list of facts that occur on that line, or possibly a use
. So something like:
use('a), outlives('a: 'b), borrow_region_at('b, 'c)
which might fit this grammar:
Statement = { Fact, Use };
Fact = {
"outlives" "(" Region ":" Region ")",
"borrow_region_at" "(" Region "," Loan ")",
"invalidates" "(" Loan ")",
};
Use = "use" "(" Region ")"
And of course we need the base strings:
Block = r"B\w+"; // or whatever
Region = r"'\w+"; // ...
Loan = r"L\w+"; // ...
We would then convert these into tuples of AllFacts
and run the computation.
Right now, the "input" to this program is a bunch of facts you could never generate by hand. We should add some sort of front-end more like chalk, that takes a human-writable format -- perhaps similar to MIR? And generates facts from it. This would allow us to do more unit testing.
It'd probably be good then to factor out the "core algorithm" into a separate crate, so that it can eventually be shared by rustc and this crate.
I'd be open to suggestions for what the input format should look like =)