runtimeverification / k

K Framework Tools 7.0
BSD 3-Clause "New" or "Revised" License
454 stars 150 forks source link

Automated experiments on the compiler pipeline #3973

Open gtrepta opened 9 months ago

gtrepta commented 9 months ago

The compiler pipeline in the frontend has grown gradually over the lifetime of the project, and the purpose for each stage and their dependencies on each other has become unclear. Some stages have little to no documentation, and are possibly unneeded or more complex than they need to be.

An investigation into the pipeline by poking at each of these stages, removing them or moving them around and seeing if it breaks any tests and why those tests break can shed some light onto the pipeline.

One technique for doing this can be with git bisect. A sequence of commits on top of develop that each make a single, small change to the pipeline can be fed to git bisect run with a script that rebuilds the frontend and runs the tests. git bisect will then find the first commit that breaks something, giving us changes before it that can be made, and a breaking change that we can investigate to help fill in missing documentation or make a refactoring.

gtrepta commented 9 months ago

I had a look at the duplicate pairs of generateSortPredicateSyntax and generateSortProjections in the pipeline, seeing how many stages I could remove between them.

For reference, here's how the stages are set up currently:

https://github.com/runtimeverification/k/blob/1e4aa2be6b49ca766e01d78eae0e418a749bb12b/kernel/src/main/java/org/kframework/backend/kore/KoreBackend.java#L248-L259

And here's after I moved what I could:

            .andThen(constantFolding)
            .andThen(propagateMacroToRules)
            .andThen(guardOrs)
            .andThen(resolveFreshConfigConstants)
            .andThen(generateSortPredicateSyntax)
            .andThen(generateSortProjections)
            .andThen(expandMacros)                                    //
            .andThen(AddImplicitComputationCell::transformDefinition) // Only these three remain
            .andThen(resolveFreshConstants)                           //
            .andThen(generateSortPredicateSyntax)
            .andThen(generateSortProjections)
            .andThen(checkSimplificationRules)

The regression tests still pass with this setup. The definitions in the profiling tests also kompile without issue, and identical definition.kore files get generated as on develop. So it feels pretty safe to make a PR, but I'll try some more testing downstream. The symbolic backends aren't exercised very much here.

gtrepta commented 9 months ago

I've compared the definition.kore files generated on evm-semantics and wasm-semantics and they're identical. I also compared all of the definitions in the regression tests, and they differ, but it only appears to be the ordering and numbering of some sentences. I feel confident enough to open a PR for this, and leave the decision to merge it in up for review.

gtrepta commented 9 months ago

Here's the script I've been using for git bisect run to build and run tests:

#!/usr/bin/bash

set -exuo pipefail

PWD="$(pwd)"
K_BIN="${PWD}/k-distribution/bin"
LOG_DIR="${PWD}/bisect-log"
COMMIT=$(git rev-parse --short HEAD)

trap "${K_BIN}/stop-kserver || true" INT TERM EXIT
mkdir -p "${LOG_DIR}"
${K_BIN}/stop-kserver || true # make sure no kserver is running

mvn package -DskipTests -Dproject.build.type=Debug -T1C

${K_BIN}/kserver &> /dev/null &

cd k-distribution/tests/regression-new

make clean
make -j8 -O |& tee ${LOG_DIR}/${COMMIT}.out
Baltoli commented 9 months ago

We should carry on this line of investigation; we could do the following:

gtrepta commented 9 months ago

Here's another experiment I'm working on. The idea is to take the stage at the end of the pipeline and pull it as far as possible to the front. Here's a snippet that replaces the end of the steps method in KoreBackend:

    List<Function1<Definition, Definition>> defTransformerList = new ArrayList<>();
    defTransformerList.add(resolveComm);
    defTransformerList.add(resolveIO);
    defTransformerList.add(resolveFun);
    defTransformerList.add(resolveFunctionWithConfig);
    defTransformerList.add(resolveStrict);
    defTransformerList.add(resolveAnonVars);
    defTransformerList.add(d -> new ResolveContexts().resolve(d));
    defTransformerList.add(numberSentences);
    defTransformerList.add(resolveHeatCoolAttribute);
    defTransformerList.add(resolveSemanticCasts);
    defTransformerList.add(subsortKItem);
    defTransformerList.add(constantFolding);
    defTransformerList.add(propagateMacroToRules);
    defTransformerList.add(guardOrs);
    defTransformerList.add(resolveFreshConfigConstants);
    defTransformerList.add(generateSortPredicateSyntax);
    defTransformerList.add(generateSortProjections);
    defTransformerList.add(expandMacros);
    defTransformerList.add(AddImplicitComputationCell::transformDefinition);
    defTransformerList.add(resolveFreshConstants);
    defTransformerList.add(generateSortPredicateSyntax);
    defTransformerList.add(generateSortProjections);
    defTransformerList.add(checkSimplificationRules);
    defTransformerList.add(subsortKItem);
    defTransformerList.add(d -> new Strategy().addStrategyCellToRulesTransformer(d).apply(d));
    defTransformerList.add(
        d -> Strategy.addStrategyRuleToMainModule(d.mainModule().name()).apply(d));
    defTransformerList.add(d -> ConcretizeCells.transformDefinition(d));
    defTransformerList.add(genCoverage);
    defTransformerList.add(Kompile::addSemanticsModule);
    defTransformerList.add(resolveConfigVar);
    defTransformerList.add(addCoolLikeAtt);
    defTransformerList.add(markExtraConcreteRules);
    defTransformerList.add(removeAnywhereRules);
    defTransformerList.add(generateSortPredicateRules);
    defTransformerList.add(numberSentences);

    return def -> {
      int pos = defTransformerList.size() - 2;

      Definition result = null;

      while (pos > 0) {
        Function1<Definition, Definition> toMove = defTransformerList.remove(pos--);
        defTransformerList.add(pos, toMove);

        try {
          result = defTransformerList.stream().reduce(x -> x, (a, f) -> a.andThen(f)).apply(def);
        } catch (KEMException e) {
          throw e;
        }
      }

      return result;
    };
  }

The stages are held in a java list, and composed together with a reduce operation, and then ran on the input definition. But, this is done repeatedly, moving the stage in question up by one position in the list each time.

I have more stages to check, but I've found so far that removeAnywhereRules and markExtraConcreteRules have been able to be pulled all the way to the front of the pipeline without breaking it. I think these are only supposed to operate on user supplied rules, so it makes sense that they work fine before any transformations that add new rules.

addCoolLikeAtt can be pulled to the front, but I'm less sure that this would be correct. cool-like is meant to be added to rules that have the k cell, holding a k sequence with a variable at the top. It's an attribute that the llvm-backend matching tree generation uses to apply a special priority that optimizes the tree. Aside from cooling rules that use the cool attribute, I'm not sure what other kinds of generated rules can appear that would be changed by this pass.

One thing to note is that this only tests for breakages in the pipeline during compilation, further testing is needed in the regression tests and downstream for any changes that we might want to make.

Baltoli commented 9 months ago

This is very cool!

I have more stages to check, but I've found so far that removeAnywhereRules and markExtraConcreteRules have been able to be pulled all the way to the front of the pipeline without breaking it. I think these are only supposed to operate on user supplied rules, so it makes sense that they work fine before any transformations that add new rules.

Perhaps open a PR that moves these to the front of the list explicitly, and adds a comment noting that they have no dependencies on later passes and are designed to apply directly to user code. If there's any low-hanging fruit to be done here in terms of code cleanup, refactoring etc. then it might be a good time to do so.

addCoolLikeAtt can be pulled to the front, but I'm less sure that this would be correct. cool-like is meant to be added to rules that have the k cell, holding a k sequence with a variable at the top. It's an attribute that the llvm-backend matching tree generation uses to apply a special priority that optimizes the tree. Aside from cooling rules that use the cool attribute, I'm not sure what other kinds of generated rules can appear that would be changed by this pass.

Indeed - as we said in the K meeting, we don't necessarily have an a priori assumption that it will be correct to move everything around. If there's something with non-obvious dependencies like this one, then it's perfectly fine to leave it out and come back to refactoring it in the future if needs be.

ehildenb commented 9 months ago

It would be nice if we could do these experiments without having ot rebuild K. That would require making it so that the compiler passes we choose ot include can be set on the CLI instead of directly hardcoded in this file.

Would it be possible to attach a string identifier to each compiler pass, and then build up this list of passes dynamically instead of statically? That would also allow for much easier downstream testing of moving the compiler passes around and figuring out their dependencies. Additionally, we could then also have tests that only run a part of the compiler pipeline (perhaps loading the JSON KAST, running a compiler pipeline step, then dumping the resulting JSON Kast), which would be a much more tailored test than the end-to-end integration tests we currently have.

Baltoli commented 9 months ago

A couple of things I noticed when looking through the code that builds the compilation pipeline:

I think the LLVM optimisation pipeline infrastructure is a pretty good place to look for design inspiration. Some features there that are worth looking at:

ehildenb commented 9 months ago

Some more information from investigation with @gtrepta:

Overall kompile pipeline (in Kompile.java, method Kompile.run(...)), has 3 major steps:

ehildenb commented 9 months ago

We could pull applying sort synonyms and implicit IO imports out of the parseDefinition into its own explicit step, and then call that step directly. Then we are breaking the entire kompiler pipeline into major "epochs", which could be made more fine-grained over time (as opposed to current approach, where we were thinking of starting with the pipeline as fine as the individual steps and understanding the steps).

Proposed rough sketch https://github.com/runtimeverification/k/blob/08477b587642c382e3ff11d7983431bfa3e8c124/kernel/src/main/java/org/kframework/kompile/Kompile.java#L212:

definition = parseDefinition(...);  // Read the definition in
definition = preCheckPasses(...); // apply sort synonyms and implicit IO module import
checkDefinition(definition);
definition = everythingUpToExpandMacros(...);
definition = expandMacros(...);
definition = everythingUpToCheckSimplificationRules(...);
checkSimplificationRule(definition);
definition = remainderOfKoreBackendSteps(...);

Then we slowly break each chunk of the kore steps out into other atomic units taht must be run together.

This is just an idea of how to attach teh problem in the other direction. Basically, here, we know that the "dependency/use" chain is a straight line, not a dag, because of how the compiler is currently structured.

At the very least, we could pull out the preCheckPasses(...), and the checkSimplificationRule(...) pass, for more consistency.

Scott-Guest commented 9 months ago

As suggested by @ehildenb, I'll also work on recording all the attribute read-write dependencies between passes.

This will help to clean up the attribute usages in general and give us a subset of the overall pass dependency DAG.

gtrepta commented 9 months ago

ApplySynonyms appears to be tightly bound to the inner parsing logic, and can't be moved any later than resolveNonConfigBubbles, so it's probably best to just leave it where it is in ParserUtils.loadModules for now. We could decide later if we want to separate it out somehow within the parser library as its own step when we do cleanups there.

gtrepta commented 9 months ago

Next steps, after discussing with @ehildenb:

This will allow faster experimentation of different orderings of the transformations to find dependencies and will make it easier to do these sorts of experiments on downstream semantics.

This along with the previous option allows us to generate isolated tests for each compiler pass with input/output jsons. We can also get the input/output of each individual step in the pipeline and see the actual changes it's making or if it isn't even making any changes, depending on what's in the definition.

This could also be used to help integrate pyk kompile with the java frontend, as pyk can do the outer parsing, and then pass that to kompile where inner parsing and pipeline transformations can be done.

gtrepta commented 8 months ago

Here's a spreadsheet of every regression test, each pipeline stage, and whether that stage made a change to the definition (1) or didn't make any change (0).

https://docs.google.com/spreadsheets/d/1Jq8XYdNpzTv9XPXQZx1cOTRKBNWsq1fmz3HH4Ipv0Cs/edit?usp=sharing

Baltoli commented 8 months ago

@ehildenb

gtrepta commented 8 months ago

Here's another spreadsheet. Each column represents a stage in the kore backend pipeline that was dropped from compilation, and each row is a test in the regression suite with its test result after the stage was dropped. Each cell says whether the test passed (PASS), failed (FAIL), or encountered an error during compilation (KOMPILE). Note that resolveHeatCoolAttribute was skipped for this experiment because it produced interpreters that would never terminate, and that was too hard to deal with.

https://docs.google.com/spreadsheets/d/13jDHr49_EnIOSnhlKzbvpm8FXN620itrZBywmLuKRJs/edit?usp=sharing

Baltoli commented 8 months ago

Nice! So from that second spreadsheet, any fully-green columns indicate that the compiler pass perhaps isn't being exercised properly by the regression test suite? Even if the internal data structures get changed, we still aren't observing any actual behavioural differences at the level of the test suite. It would be interesting to see if we can write a test for each green column that breaks if the pass is removed; doing so would also be a good way of understanding more concretely at the K source level what the passes affect.

ehildenb commented 8 months ago

That's my interpretation as well. I see:

Baltoli commented 8 months ago

addStrategyCellToRules addStrategyRuleToMainModule

These, at least, are planned for removal

gtrepta commented 8 months ago

I've looked at guardOrs, introduced by #340 . Referencing the first spreadsheet, the only tests that this stage does anything on are or-haskell, issue-1789-rhsOr/llvm, and or-llvm.

It wraps #Or terms outside of a rewrite in an #as pattern, which is supposed to make an unambiguous substitution for them. This was added five years ago, so either it's no longer necessary or other changes has made these tests miss the breaking case. @dwightguth do you know?

gtrepta commented 8 months ago

generateSortProjections2 looks like it's only creating productions and rules of projections for the generated top and counter cells.

Baltoli commented 8 months ago

@dwightguth doesn't see why guardOrs isn't broken; @gtrepta to investigate in more detail here to see what's going on.

gtrepta commented 8 months ago

subsortKItem1 and subsortKItem2 don't seem to have any effect on the kore backend pipeline. Omitting them does, however, cause an error when later a call to AddSortInjection attempts to compute the least upper bound for a collection of sorts. Keeping a subsortKItem pass at the end of the pipeline keeps everything working and all of the regression tests pass.

This brings up something I'm noticing about invariants. AFAIK every sort being a subsort of KItem is supposed to be an invariant, but we're enforcing this by explicitly creating the syntax declarations. This creates the phenomenon of the invariant being broken by a stage, staying broken for some intermediate stages, and then being fixed by another stage. This seems less than ideal from a maintainability standpoint, as one would want to rely on every invariant that they know about a K definition, but instead need to keep track of whether or not they're in a place where they could actually use that invariant. Keeping these invariants from being broken, either with assertion checks, or making some of them exist implicitly could help out here.

gtrepta commented 8 months ago

Here's a list of dependencies between compilation passes that I've found. I will continue to update it as I find them.

gtrepta commented 8 months ago

I've added a second sheet to the spreadsheet for results from dropping pipeline steps. This one covers the regression tests that expect a failure with an error message (my methodology earlier couldn't cover these tests). If a cell says PASS then it means that test printed out the expected error message, but if it says FAIL then something else happened instead. This reveals a couple of pipeline steps that fail a test where on the previous sheet they had completely green columns: