Open gtrepta opened 9 months ago
I had a look at the duplicate pairs of generateSortPredicateSyntax
and generateSortProjections
in the pipeline, seeing how many stages I could remove between them.
For reference, here's how the stages are set up currently:
And here's after I moved what I could:
.andThen(constantFolding)
.andThen(propagateMacroToRules)
.andThen(guardOrs)
.andThen(resolveFreshConfigConstants)
.andThen(generateSortPredicateSyntax)
.andThen(generateSortProjections)
.andThen(expandMacros) //
.andThen(AddImplicitComputationCell::transformDefinition) // Only these three remain
.andThen(resolveFreshConstants) //
.andThen(generateSortPredicateSyntax)
.andThen(generateSortProjections)
.andThen(checkSimplificationRules)
The regression tests still pass with this setup. The definitions in the profiling tests also kompile without issue, and identical definition.kore
files get generated as on develop
. So it feels pretty safe to make a PR, but I'll try some more testing downstream. The symbolic backends aren't exercised very much here.
I've compared the definition.kore
files generated on evm-semantics and wasm-semantics and they're identical. I also compared all of the definitions in the regression tests, and they differ, but it only appears to be the ordering and numbering of some sentences. I feel confident enough to open a PR for this, and leave the decision to merge it in up for review.
Here's the script I've been using for git bisect run
to build and run tests:
#!/usr/bin/bash
set -exuo pipefail
PWD="$(pwd)"
K_BIN="${PWD}/k-distribution/bin"
LOG_DIR="${PWD}/bisect-log"
COMMIT=$(git rev-parse --short HEAD)
trap "${K_BIN}/stop-kserver || true" INT TERM EXIT
mkdir -p "${LOG_DIR}"
${K_BIN}/stop-kserver || true # make sure no kserver is running
mvn package -DskipTests -Dproject.build.type=Debug -T1C
${K_BIN}/kserver &> /dev/null &
cd k-distribution/tests/regression-new
make clean
make -j8 -O |& tee ${LOG_DIR}/${COMMIT}.out
We should carry on this line of investigation; we could do the following:
f(f(x)) == f(x)
Here's another experiment I'm working on. The idea is to take the stage at the end of the pipeline and pull it as far as possible to the front. Here's a snippet that replaces the end of the steps
method in KoreBackend
:
List<Function1<Definition, Definition>> defTransformerList = new ArrayList<>();
defTransformerList.add(resolveComm);
defTransformerList.add(resolveIO);
defTransformerList.add(resolveFun);
defTransformerList.add(resolveFunctionWithConfig);
defTransformerList.add(resolveStrict);
defTransformerList.add(resolveAnonVars);
defTransformerList.add(d -> new ResolveContexts().resolve(d));
defTransformerList.add(numberSentences);
defTransformerList.add(resolveHeatCoolAttribute);
defTransformerList.add(resolveSemanticCasts);
defTransformerList.add(subsortKItem);
defTransformerList.add(constantFolding);
defTransformerList.add(propagateMacroToRules);
defTransformerList.add(guardOrs);
defTransformerList.add(resolveFreshConfigConstants);
defTransformerList.add(generateSortPredicateSyntax);
defTransformerList.add(generateSortProjections);
defTransformerList.add(expandMacros);
defTransformerList.add(AddImplicitComputationCell::transformDefinition);
defTransformerList.add(resolveFreshConstants);
defTransformerList.add(generateSortPredicateSyntax);
defTransformerList.add(generateSortProjections);
defTransformerList.add(checkSimplificationRules);
defTransformerList.add(subsortKItem);
defTransformerList.add(d -> new Strategy().addStrategyCellToRulesTransformer(d).apply(d));
defTransformerList.add(
d -> Strategy.addStrategyRuleToMainModule(d.mainModule().name()).apply(d));
defTransformerList.add(d -> ConcretizeCells.transformDefinition(d));
defTransformerList.add(genCoverage);
defTransformerList.add(Kompile::addSemanticsModule);
defTransformerList.add(resolveConfigVar);
defTransformerList.add(addCoolLikeAtt);
defTransformerList.add(markExtraConcreteRules);
defTransformerList.add(removeAnywhereRules);
defTransformerList.add(generateSortPredicateRules);
defTransformerList.add(numberSentences);
return def -> {
int pos = defTransformerList.size() - 2;
Definition result = null;
while (pos > 0) {
Function1<Definition, Definition> toMove = defTransformerList.remove(pos--);
defTransformerList.add(pos, toMove);
try {
result = defTransformerList.stream().reduce(x -> x, (a, f) -> a.andThen(f)).apply(def);
} catch (KEMException e) {
throw e;
}
}
return result;
};
}
The stages are held in a java list, and composed together with a reduce
operation, and then ran on the input definition. But, this is done repeatedly, moving the stage in question up by one position in the list each time.
I have more stages to check, but I've found so far that removeAnywhereRules
and markExtraConcreteRules
have been able to be pulled all the way to the front of the pipeline without breaking it. I think these are only supposed to operate on user supplied rules, so it makes sense that they work fine before any transformations that add new rules.
addCoolLikeAtt
can be pulled to the front, but I'm less sure that this would be correct. cool-like
is meant to be added to rules that have the k cell, holding a k sequence with a variable at the top. It's an attribute that the llvm-backend matching tree generation uses to apply a special priority that optimizes the tree. Aside from cooling rules that use the cool
attribute, I'm not sure what other kinds of generated rules can appear that would be changed by this pass.
One thing to note is that this only tests for breakages in the pipeline during compilation, further testing is needed in the regression tests and downstream for any changes that we might want to make.
This is very cool!
I have more stages to check, but I've found so far that removeAnywhereRules and markExtraConcreteRules have been able to be pulled all the way to the front of the pipeline without breaking it. I think these are only supposed to operate on user supplied rules, so it makes sense that they work fine before any transformations that add new rules.
Perhaps open a PR that moves these to the front of the list explicitly, and adds a comment noting that they have no dependencies on later passes and are designed to apply directly to user code. If there's any low-hanging fruit to be done here in terms of code cleanup, refactoring etc. then it might be a good time to do so.
addCoolLikeAtt can be pulled to the front, but I'm less sure that this would be correct. cool-like is meant to be added to rules that have the k cell, holding a k sequence with a variable at the top. It's an attribute that the llvm-backend matching tree generation uses to apply a special priority that optimizes the tree. Aside from cooling rules that use the cool attribute, I'm not sure what other kinds of generated rules can appear that would be changed by this pass.
Indeed - as we said in the K meeting, we don't necessarily have an a priori assumption that it will be correct to move everything around. If there's something with non-obvious dependencies like this one, then it's perfectly fine to leave it out and come back to refactoring it in the future if needs be.
It would be nice if we could do these experiments without having ot rebuild K. That would require making it so that the compiler passes we choose ot include can be set on the CLI instead of directly hardcoded in this file.
Would it be possible to attach a string identifier to each compiler pass, and then build up this list of passes dynamically instead of statically? That would also allow for much easier downstream testing of moving the compiler passes around and figuring out their dependencies. Additionally, we could then also have tests that only run a part of the compiler pipeline (perhaps loading the JSON KAST, running a compiler pipeline step, then dumping the resulting JSON Kast), which would be a much more tailored test than the end-to-end integration tests we currently have.
A couple of things I noticed when looking through the code that builds the compilation pipeline:
ModuleTransformer
etc. breaks compositionality.I think the LLVM optimisation pipeline infrastructure is a pretty good place to look for design inspiration. Some features there that are worth looking at:
FunctionPass
very directly becomes a ModulePass
etc. This isn't the case for our current design..andThen(...).andThen(...)
approach.Some more information from investigation with @gtrepta:
Overall kompile pipeline (in Kompile.java
, method Kompile.run(...)
), has 3 major steps:
parseDefinition
: parses, and runs 2 very simple passes:
checkDefinition
: checks a bunch of structural things about the definitionpipeline.apply(...)
: runs KoreBackend.steps()
, the bulk of the compiler pipeline. There is one check in this pipeline, which is checkSimplificationRules
(which checks that all simplification
rules are functional). This is because this check must be run after macro expansion.We could pull applying sort synonyms and implicit IO imports out of the parseDefinition into its own explicit step, and then call that step directly. Then we are breaking the entire kompiler pipeline into major "epochs", which could be made more fine-grained over time (as opposed to current approach, where we were thinking of starting with the pipeline as fine as the individual steps and understanding the steps).
Proposed rough sketch https://github.com/runtimeverification/k/blob/08477b587642c382e3ff11d7983431bfa3e8c124/kernel/src/main/java/org/kframework/kompile/Kompile.java#L212:
definition = parseDefinition(...); // Read the definition in
definition = preCheckPasses(...); // apply sort synonyms and implicit IO module import
checkDefinition(definition);
definition = everythingUpToExpandMacros(...);
definition = expandMacros(...);
definition = everythingUpToCheckSimplificationRules(...);
checkSimplificationRule(definition);
definition = remainderOfKoreBackendSteps(...);
Then we slowly break each chunk of the kore steps out into other atomic units taht must be run together.
This is just an idea of how to attach teh problem in the other direction. Basically, here, we know that the "dependency/use" chain is a straight line, not a dag, because of how the compiler is currently structured.
At the very least, we could pull out the preCheckPasses(...)
, and the checkSimplificationRule(...)
pass, for more consistency.
As suggested by @ehildenb, I'll also work on recording all the attribute read-write dependencies between passes.
This will help to clean up the attribute usages in general and give us a subset of the overall pass dependency DAG.
ApplySynonyms
appears to be tightly bound to the inner parsing logic, and can't be moved any later than resolveNonConfigBubbles
, so it's probably best to just leave it where it is in ParserUtils.loadModules
for now. We could decide later if we want to separate it out somehow within the parser library as its own step when we do cleanups there.
Next steps, after discussing with @ehildenb:
This will allow faster experimentation of different orderings of the transformations to find dependencies and will make it easier to do these sorts of experiments on downstream semantics.
This along with the previous option allows us to generate isolated tests for each compiler pass with input/output jsons. We can also get the input/output of each individual step in the pipeline and see the actual changes it's making or if it isn't even making any changes, depending on what's in the definition.
This could also be used to help integrate pyk kompile with the java frontend, as pyk can do the outer parsing, and then pass that to kompile where inner parsing and pipeline transformations can be done.
Here's a spreadsheet of every regression test, each pipeline stage, and whether that stage made a change to the definition (1) or didn't make any change (0).
https://docs.google.com/spreadsheets/d/1Jq8XYdNpzTv9XPXQZx1cOTRKBNWsq1fmz3HH4Ipv0Cs/edit?usp=sharing
@ehildenb
Here's another spreadsheet. Each column represents a stage in the kore backend pipeline that was dropped from compilation, and each row is a test in the regression suite with its test result after the stage was dropped. Each cell says whether the test passed (PASS), failed (FAIL), or encountered an error during compilation (KOMPILE). Note that resolveHeatCoolAttribute
was skipped for this experiment because it produced interpreters that would never terminate, and that was too hard to deal with.
https://docs.google.com/spreadsheets/d/13jDHr49_EnIOSnhlKzbvpm8FXN620itrZBywmLuKRJs/edit?usp=sharing
Nice! So from that second spreadsheet, any fully-green columns indicate that the compiler pass perhaps isn't being exercised properly by the regression test suite? Even if the internal data structures get changed, we still aren't observing any actual behavioural differences at the level of the test suite. It would be interesting to see if we can write a test for each green column that breaks if the pass is removed; doing so would also be a good way of understanding more concretely at the K source level what the passes affect.
That's my interpretation as well. I see:
subsortKItem1
guardOrs
generateSortProjections2
checkSimplificationRules
subsortKItem2
addStrategyCellToRules
addStrategyRuleToMainModule
genCoverage
addCoolLikeAtt
removeAnywhereRules
numberSentences2
addStrategyCellToRules
addStrategyRuleToMainModule
These, at least, are planned for removal
I've looked at guardOrs
, introduced by #340 . Referencing the first spreadsheet, the only tests that this stage does anything on are or-haskell
, issue-1789-rhsOr/llvm
, and or-llvm
.
It wraps #Or
terms outside of a rewrite in an #as
pattern, which is supposed to make an unambiguous substitution for them. This was added five years ago, so either it's no longer necessary or other changes has made these tests miss the breaking case. @dwightguth do you know?
generateSortProjections2
looks like it's only creating productions and rules of projections for the generated top and counter cells.
@dwightguth doesn't see why guardOrs
isn't broken; @gtrepta to investigate in more detail here to see what's going on.
#Or
pattern on the RHS when KORE is generated?subsortKItem1
and subsortKItem2
don't seem to have any effect on the kore backend pipeline. Omitting them does, however, cause an error when later a call to AddSortInjection
attempts to compute the least upper bound for a collection of sorts. Keeping a subsortKItem pass at the end of the pipeline keeps everything working and all of the regression tests pass.
This brings up something I'm noticing about invariants. AFAIK every sort being a subsort of KItem is supposed to be an invariant, but we're enforcing this by explicitly creating the syntax declarations. This creates the phenomenon of the invariant being broken by a stage, staying broken for some intermediate stages, and then being fixed by another stage. This seems less than ideal from a maintainability standpoint, as one would want to rely on every invariant that they know about a K definition, but instead need to keep track of whether or not they're in a place where they could actually use that invariant. Keeping these invariants from being broken, either with assertion checks, or making some of them exist implicitly could help out here.
Here's a list of dependencies between compilation passes that I've found. I will continue to update it as I find them.
expandMacros
-> propogateMacroToRules
expandMacros
needs the macro attribute on rules with a macro production at the top of the LHS in order to expand that production elsewhere. propogateMacroToRules
is meant to put the macro attribute on those rules, and if it doesn't, then expandMacros
is very likely to find a macro production it can't expand and will throw an error.genCoverage
-> numberSentences
genCoverage
expects the UNIQUE_ID
attribute to be on every rule in the definition and uses it for the coverage output when it instruments each rule to log itself. This attribute is added by numberSentences
, and genCoverage
will throw a NoSuchElementException
if it hasn't happened. Also, we probably want genCoverage
to happen after every generated rule that will be in the final definition is there, but it looks like that may not be the case.resolveFreshConstants
-> resolveFreshConfigConstants
The initial value for GeneratedCounterCell
depends on any fresh variables that are used in the initial configuration, and this is supposed to come from resolveFreshConfigConstants
. If resolveFreshConstants
happens first, then it's possible for a fresh constant that gets generated during execution to be equal to a fresh constant that was in the inital configuration. This case gets caught be the regression test in issue-3520-freshConfig
.I've added a second sheet to the spreadsheet for results from dropping pipeline steps. This one covers the regression tests that expect a failure with an error message (my methodology earlier couldn't cover these tests). If a cell says PASS
then it means that test printed out the expected error message, but if it says FAIL
then something else happened instead. This reveals a couple of pipeline steps that fail a test where on the previous sheet they had completely green columns:
checkSimplificationRules
with checks/functionSimplification
removeAnywhereRules
with issue-2909-allow-anywhere-haskell/check
The compiler pipeline in the frontend has grown gradually over the lifetime of the project, and the purpose for each stage and their dependencies on each other has become unclear. Some stages have little to no documentation, and are possibly unneeded or more complex than they need to be.
An investigation into the pipeline by poking at each of these stages, removing them or moving them around and seeing if it breaks any tests and why those tests break can shed some light onto the pipeline.
One technique for doing this can be with git bisect. A sequence of commits on top of
develop
that each make a single, small change to the pipeline can be fed togit bisect run
with a script that rebuilds the frontend and runs the tests.git bisect
will then find the first commit that breaks something, giving us changes before it that can be made, and a breaking change that we can investigate to help fill in missing documentation or make a refactoring.