Automated experiments on the compiler pipeline

gtrepta commented 9 months ago

The compiler pipeline in the frontend has grown gradually over the lifetime of the project, and the purpose for each stage and their dependencies on each other has become unclear. Some stages have little to no documentation, and are possibly unneeded or more complex than they need to be.

An investigation into the pipeline by poking at each of these stages, removing them or moving them around and seeing if it breaks any tests and why those tests break can shed some light onto the pipeline.

One technique for doing this can be with git bisect. A sequence of commits on top of develop that each make a single, small change to the pipeline can be fed to git bisect run with a script that rebuilds the frontend and runs the tests. git bisect will then find the first commit that breaks something, giving us changes before it that can be made, and a breaking change that we can investigate to help fill in missing documentation or make a refactoring.

gtrepta commented 9 months ago

I had a look at the duplicate pairs of generateSortPredicateSyntax and generateSortProjections in the pipeline, seeing how many stages I could remove between them.

For reference, here's how the stages are set up currently:

https://github.com/runtimeverification/k/blob/1e4aa2be6b49ca766e01d78eae0e418a749bb12b/kernel/src/main/java/org/kframework/backend/kore/KoreBackend.java#L248-L259

And here's after I moved what I could:

            .andThen(constantFolding)
            .andThen(propagateMacroToRules)
            .andThen(guardOrs)
            .andThen(resolveFreshConfigConstants)
            .andThen(generateSortPredicateSyntax)
            .andThen(generateSortProjections)
            .andThen(expandMacros)                                    //
            .andThen(AddImplicitComputationCell::transformDefinition) // Only these three remain
            .andThen(resolveFreshConstants)                           //
            .andThen(generateSortPredicateSyntax)
            .andThen(generateSortProjections)
            .andThen(checkSimplificationRules)

The regression tests still pass with this setup. The definitions in the profiling tests also kompile without issue, and identical definition.kore files get generated as on develop. So it feels pretty safe to make a PR, but I'll try some more testing downstream. The symbolic backends aren't exercised very much here.

gtrepta commented 9 months ago

I've compared the definition.kore files generated on evm-semantics and wasm-semantics and they're identical. I also compared all of the definitions in the regression tests, and they differ, but it only appears to be the ordering and numbering of some sentences. I feel confident enough to open a PR for this, and leave the decision to merge it in up for review.

gtrepta commented 9 months ago

Here's the script I've been using for git bisect run to build and run tests:

#!/usr/bin/bash

set -exuo pipefail

PWD="$(pwd)"
K_BIN="${PWD}/k-distribution/bin"
LOG_DIR="${PWD}/bisect-log"
COMMIT=$(git rev-parse --short HEAD)

trap "${K_BIN}/stop-kserver || true" INT TERM EXIT
mkdir -p "${LOG_DIR}"
${K_BIN}/stop-kserver || true # make sure no kserver is running

mvn package -DskipTests -Dproject.build.type=Debug -T1C

${K_BIN}/kserver &> /dev/null &

cd k-distribution/tests/regression-new

make clean
make -j8 -O |& tee ${LOG_DIR}/${COMMIT}.out

Baltoli commented 9 months ago

We should carry on this line of investigation; we could do the following:

Chunk up the pipeline and see what passes depend on each other implicitly
Check idempotency f(f(x)) == f(x)
More duplicate passes (number sentences, ...)

gtrepta commented 9 months ago

Here's another experiment I'm working on. The idea is to take the stage at the end of the pipeline and pull it as far as possible to the front. Here's a snippet that replaces the end of the steps method in KoreBackend:

    List<Function1<Definition, Definition>> defTransformerList = new ArrayList<>();
    defTransformerList.add(resolveComm);
    defTransformerList.add(resolveIO);
    defTransformerList.add(resolveFun);
    defTransformerList.add(resolveFunctionWithConfig);
    defTransformerList.add(resolveStrict);
    defTransformerList.add(resolveAnonVars);
    defTransformerList.add(d -> new ResolveContexts().resolve(d));
    defTransformerList.add(numberSentences);
    defTransformerList.add(resolveHeatCoolAttribute);
    defTransformerList.add(resolveSemanticCasts);
    defTransformerList.add(subsortKItem);
    defTransformerList.add(constantFolding);
    defTransformerList.add(propagateMacroToRules);
    defTransformerList.add(guardOrs);
    defTransformerList.add(resolveFreshConfigConstants);
    defTransformerList.add(generateSortPredicateSyntax);
    defTransformerList.add(generateSortProjections);
    defTransformerList.add(expandMacros);
    defTransformerList.add(AddImplicitComputationCell::transformDefinition);
    defTransformerList.add(resolveFreshConstants);
    defTransformerList.add(generateSortPredicateSyntax);
    defTransformerList.add(generateSortProjections);
    defTransformerList.add(checkSimplificationRules);
    defTransformerList.add(subsortKItem);
    defTransformerList.add(d -> new Strategy().addStrategyCellToRulesTransformer(d).apply(d));
    defTransformerList.add(
        d -> Strategy.addStrategyRuleToMainModule(d.mainModule().name()).apply(d));
    defTransformerList.add(d -> ConcretizeCells.transformDefinition(d));
    defTransformerList.add(genCoverage);
    defTransformerList.add(Kompile::addSemanticsModule);
    defTransformerList.add(resolveConfigVar);
    defTransformerList.add(addCoolLikeAtt);
    defTransformerList.add(markExtraConcreteRules);
    defTransformerList.add(removeAnywhereRules);
    defTransformerList.add(generateSortPredicateRules);
    defTransformerList.add(numberSentences);

    return def -> {
      int pos = defTransformerList.size() - 2;

      Definition result = null;

      while (pos > 0) {
        Function1<Definition, Definition> toMove = defTransformerList.remove(pos--);
        defTransformerList.add(pos, toMove);

        try {
          result = defTransformerList.stream().reduce(x -> x, (a, f) -> a.andThen(f)).apply(def);
        } catch (KEMException e) {
          throw e;
        }
      }

      return result;
    };
  }

The stages are held in a java list, and composed together with a reduce operation, and then ran on the input definition. But, this is done repeatedly, moving the stage in question up by one position in the list each time.

I have more stages to check, but I've found so far that removeAnywhereRules and markExtraConcreteRules have been able to be pulled all the way to the front of the pipeline without breaking it. I think these are only supposed to operate on user supplied rules, so it makes sense that they work fine before any transformations that add new rules.

addCoolLikeAtt can be pulled to the front, but I'm less sure that this would be correct. cool-like is meant to be added to rules that have the k cell, holding a k sequence with a variable at the top. It's an attribute that the llvm-backend matching tree generation uses to apply a special priority that optimizes the tree. Aside from cooling rules that use the cool attribute, I'm not sure what other kinds of generated rules can appear that would be changed by this pass.

One thing to note is that this only tests for breakages in the pipeline during compilation, further testing is needed in the regression tests and downstream for any changes that we might want to make.

Baltoli commented 9 months ago

This is very cool!

I have more stages to check, but I've found so far that removeAnywhereRules and markExtraConcreteRules have been able to be pulled all the way to the front of the pipeline without breaking it. I think these are only supposed to operate on user supplied rules, so it makes sense that they work fine before any transformations that add new rules.

Perhaps open a PR that moves these to the front of the list explicitly, and adds a comment noting that they have no dependencies on later passes and are designed to apply directly to user code. If there's any low-hanging fruit to be done here in terms of code cleanup, refactoring etc. then it might be a good time to do so.

addCoolLikeAtt can be pulled to the front, but I'm less sure that this would be correct. cool-like is meant to be added to rules that have the k cell, holding a k sequence with a variable at the top. It's an attribute that the llvm-backend matching tree generation uses to apply a special priority that optimizes the tree. Aside from cooling rules that use the cool attribute, I'm not sure what other kinds of generated rules can appear that would be changed by this pass.

Indeed - as we said in the K meeting, we don't necessarily have an a priori assumption that it will be correct to move everything around. If there's something with non-obvious dependencies like this one, then it's perfectly fine to leave it out and come back to refactoring it in the future if needs be.

ehildenb commented 9 months ago

It would be nice if we could do these experiments without having ot rebuild K. That would require making it so that the compiler passes we choose ot include can be set on the CLI instead of directly hardcoded in this file.

Would it be possible to attach a string identifier to each compiler pass, and then build up this list of passes dynamically instead of statically? That would also allow for much easier downstream testing of moving the compiler passes around and figuring out their dependencies. Additionally, we could then also have tests that only run a part of the compiler pipeline (perhaps loading the JSON KAST, running a compiler pipeline step, then dumping the resulting JSON Kast), which would be a much more tailored test than the end-to-end integration tests we currently have.

Baltoli commented 9 months ago

A couple of things I noticed when looking through the code that builds the compilation pipeline:

We build a lot of similar passes multiple times because the description string being passed to ModuleTransformer etc. breaks compositionality.
We have a lot of ad-hoc lambdas being built to close over the definition in question; this might be a common enough pattern that it's worth building into whatever the solution looks like.
This is a good opportunity to clear up some of our Scala code; we want to remove this all in the long term so it would be good for whatever the solution here is to be Java.

I think the LLVM optimisation pipeline infrastructure is a pretty good place to look for design inspiration. Some features there that are worth looking at:

Ergonomic lifting of narrow passes into wider ones; in LLVM a FunctionPass very directly becomes a ModulePass etc. This isn't the case for our current design.
Pipeline manager objects; this will admit Everett's suggestion to build a dynamic string of passes above better than the current .andThen(...).andThen(...) approach.
(eventually, we will need some more experiments to understand the dependencies properly) automatic scheduling of the pipeline based on some kind of invalidates / supplies / depends on relationship between passes.

ehildenb commented 9 months ago

Some more information from investigation with @gtrepta:

Overall kompile pipeline (in Kompile.java, method Kompile.run(...)), has 3 major steps:

parseDefinition: parses, and runs 2 very simple passes:
- apply sort synonyms
- adds implicit IO module import
checkDefinition: checks a bunch of structural things about the definition
pipeline.apply(...): runs KoreBackend.steps(), the bulk of the compiler pipeline. There is one check in this pipeline, which is checkSimplificationRules (which checks that all simplification rules are functional). This is because this check must be run after macro expansion.

ehildenb commented 9 months ago

We could pull applying sort synonyms and implicit IO imports out of the parseDefinition into its own explicit step, and then call that step directly. Then we are breaking the entire kompiler pipeline into major "epochs", which could be made more fine-grained over time (as opposed to current approach, where we were thinking of starting with the pipeline as fine as the individual steps and understanding the steps).

Proposed rough sketch https://github.com/runtimeverification/k/blob/08477b587642c382e3ff11d7983431bfa3e8c124/kernel/src/main/java/org/kframework/kompile/Kompile.java#L212:

definition = parseDefinition(...);  // Read the definition in
definition = preCheckPasses(...); // apply sort synonyms and implicit IO module import
checkDefinition(definition);
definition = everythingUpToExpandMacros(...);
definition = expandMacros(...);
definition = everythingUpToCheckSimplificationRules(...);
checkSimplificationRule(definition);
definition = remainderOfKoreBackendSteps(...);

Then we slowly break each chunk of the kore steps out into other atomic units taht must be run together.

This is just an idea of how to attach teh problem in the other direction. Basically, here, we know that the "dependency/use" chain is a straight line, not a dag, because of how the compiler is currently structured.

At the very least, we could pull out the preCheckPasses(...), and the checkSimplificationRule(...) pass, for more consistency.

Scott-Guest commented 9 months ago

As suggested by @ehildenb, I'll also work on recording all the attribute read-write dependencies between passes.

This will help to clean up the attribute usages in general and give us a subset of the overall pass dependency DAG.

gtrepta commented 9 months ago

ApplySynonyms appears to be tightly bound to the inner parsing logic, and can't be moved any later than resolveNonConfigBubbles, so it's probably best to just leave it where it is in ParserUtils.loadModules for now. We could decide later if we want to separate it out somehow within the parser library as its own step when we do cleanups there.

gtrepta commented 9 months ago

Next steps, after discussing with @ehildenb:

Make a CLI option to kompile that allows the user to specify which transformations are done and in what order.

This will allow faster experimentation of different orderings of the transformations to find dependencies and will make it easier to do these sorts of experiments on downstream semantics.

Make a CLI option that tells kompile to skip outer parsing and instead read in a parsed kast in json format to run the compilation steps on.

This along with the previous option allows us to generate isolated tests for each compiler pass with input/output jsons. We can also get the input/output of each individual step in the pipeline and see the actual changes it's making or if it isn't even making any changes, depending on what's in the definition.

This could also be used to help integrate pyk kompile with the java frontend, as pyk can do the outer parsing, and then pass that to kompile where inner parsing and pipeline transformations can be done.

gtrepta commented 8 months ago

Here's a spreadsheet of every regression test, each pipeline stage, and whether that stage made a change to the definition (1) or didn't make any change (0).

https://docs.google.com/spreadsheets/d/1Jq8XYdNpzTv9XPXQZx1cOTRKBNWsq1fmz3HH4Ipv0Cs/edit?usp=sharing

Baltoli commented 8 months ago

@ehildenb

gtrepta commented 8 months ago

Here's another spreadsheet. Each column represents a stage in the kore backend pipeline that was dropped from compilation, and each row is a test in the regression suite with its test result after the stage was dropped. Each cell says whether the test passed (PASS), failed (FAIL), or encountered an error during compilation (KOMPILE). Note that resolveHeatCoolAttribute was skipped for this experiment because it produced interpreters that would never terminate, and that was too hard to deal with.

https://docs.google.com/spreadsheets/d/13jDHr49_EnIOSnhlKzbvpm8FXN620itrZBywmLuKRJs/edit?usp=sharing

Baltoli commented 8 months ago

Nice! So from that second spreadsheet, any fully-green columns indicate that the compiler pass perhaps isn't being exercised properly by the regression test suite? Even if the internal data structures get changed, we still aren't observing any actual behavioural differences at the level of the test suite. It would be interesting to see if we can write a test for each green column that breaks if the pass is removed; doing so would also be a good way of understanding more concretely at the K source level what the passes affect.

ehildenb commented 8 months ago

That's my interpretation as well. I see:

subsortKItem1
guardOrs
generateSortProjections2
checkSimplificationRules
subsortKItem2
addStrategyCellToRules
addStrategyRuleToMainModule
genCoverage
addCoolLikeAtt
removeAnywhereRules
numberSentences2

Baltoli commented 8 months ago

addStrategyCellToRules addStrategyRuleToMainModule

These, at least, are planned for removal

gtrepta commented 8 months ago

I've looked at guardOrs, introduced by #340 . Referencing the first spreadsheet, the only tests that this stage does anything on are or-haskell, issue-1789-rhsOr/llvm, and or-llvm.

It wraps #Or terms outside of a rewrite in an #as pattern, which is supposed to make an unambiguous substitution for them. This was added five years ago, so either it's no longer necessary or other changes has made these tests miss the breaking case. @dwightguth do you know?

gtrepta commented 8 months ago

generateSortProjections2 looks like it's only creating productions and rules of projections for the generated top and counter cells.

Baltoli commented 8 months ago

@dwightguth doesn't see why guardOrs isn't broken; @gtrepta to investigate in more detail here to see what's going on.

Do the rules have an #Or pattern on the RHS when KORE is generated?
If yes, what's the generated LLVM code?
If no, why not?

gtrepta commented 8 months ago

subsortKItem1 and subsortKItem2 don't seem to have any effect on the kore backend pipeline. Omitting them does, however, cause an error when later a call to AddSortInjection attempts to compute the least upper bound for a collection of sorts. Keeping a subsortKItem pass at the end of the pipeline keeps everything working and all of the regression tests pass.

This brings up something I'm noticing about invariants. AFAIK every sort being a subsort of KItem is supposed to be an invariant, but we're enforcing this by explicitly creating the syntax declarations. This creates the phenomenon of the invariant being broken by a stage, staying broken for some intermediate stages, and then being fixed by another stage. This seems less than ideal from a maintainability standpoint, as one would want to rely on every invariant that they know about a K definition, but instead need to keep track of whether or not they're in a place where they could actually use that invariant. Keeping these invariants from being broken, either with assertion checks, or making some of them exist implicitly could help out here.

gtrepta commented 8 months ago

Here's a list of dependencies between compilation passes that I've found. I will continue to update it as I find them.

expandMacros -> propogateMacroToRules expandMacros needs the macro attribute on rules with a macro production at the top of the LHS in order to expand that production elsewhere. propogateMacroToRules is meant to put the macro attribute on those rules, and if it doesn't, then expandMacros is very likely to find a macro production it can't expand and will throw an error.
genCoverage -> numberSentences genCoverage expects the UNIQUE_ID attribute to be on every rule in the definition and uses it for the coverage output when it instruments each rule to log itself. This attribute is added by numberSentences, and genCoverage will throw a NoSuchElementException if it hasn't happened. Also, we probably want genCoverage to happen after every generated rule that will be in the final definition is there, but it looks like that may not be the case.
resolveFreshConstants -> resolveFreshConfigConstants The initial value for GeneratedCounterCell depends on any fresh variables that are used in the initial configuration, and this is supposed to come from resolveFreshConfigConstants. If resolveFreshConstants happens first, then it's possible for a fresh constant that gets generated during execution to be equal to a fresh constant that was in the inital configuration. This case gets caught be the regression test in issue-3520-freshConfig.

gtrepta commented 8 months ago

I've added a second sheet to the spreadsheet for results from dropping pipeline steps. This one covers the regression tests that expect a failure with an error message (my methodology earlier couldn't cover these tests). If a cell says PASS then it means that test printed out the expected error message, but if it says FAIL then something else happened instead. This reveals a couple of pipeline steps that fail a test where on the previous sheet they had completely green columns:

checkSimplificationRules with checks/functionSimplification
removeAnywhereRules with issue-2909-allow-anywhere-haskell/check

runtimeverification / k

Automated experiments on the compiler pipeline #3973