Optimise decision trees

Baltoli commented 1 month ago

~~Blocked on https://github.com/runtimeverification/llvm-backend/issues/1052~~

Baltoli commented 1 month ago

From https://github.com/runtimeverification/llvm-backend/issues/1052:

I've spent quite a bit of time on this and will need to park it for now; unfortunately I've not been able to come up with a satisfactory solution.

Whatever workarounds I look at here (applying normalisation passes to the generated module, breaking the Kasmer code up into smaller chunks, etc.) all bump up against the same fundamental problem that the generated decision trees for this semantics are enormous: the dt.yaml for kasmer is over 800mb, whereas the C semantics (for twice as many lines of code!) is 35mb.

This is what produces the ultimate problem here; the aarch64 code generator ends up choking because the k_step function encoding the decision tree is so big. The solution here is likely to be to find a way to optimise the size of the decision trees in the downstream semantics; this will almost certainly help the code generator out.

We might end up identifying tricks / patterns here that are useful for the LLVM backend / semantics developers, but as the initial action is on the part of the semantics I'm going to close this for now.

Baltoli commented 1 month ago

I will post updates on this optimisation work here as I make them.

This project is made up of a few nested semantics; the size of their generated decision trees are:

WASM: 5mb
Elrond Node: 28mb
ESDT: 70mb
Elrond: 440mb
- Elrond (no rules in main file): 316mb
Mandos: 770mb
Kasmer: 800mb

These values suggest that the Elrond semantics is a reasonable place to start given how much bigger it is than the baseline of the C semantics and the Elrond Node semantics; the fact that we see the code generator start to fall over specifically on the Kasmer semantics is actually a bit of a red herring.

Elrond decision tree sizes with each of the following modules included on its own:

ASYNC-HELPERS: 28mb
KRYPTO: -
WASM-AUTO-ALLOCATE: 4mb
ELROND-NODE: 28mb
ESDT: 70mb
LIST-BYTES: -
MAP-BYTES-TO-BYTES-PRIMITIVE: -
SWITCH: 31mb

Together these weigh in at 80mb when all enabled together. Now we can try to look for the smoking gun rules in the Elrond module that cause the expensive tree generation.

Notes:

Configuration nesting has quite a big effect on decision tree size. Is the reason that the C semantics does this so well that it only has 1-2 levels of nesting? The configurations here are very deep.

gtrepta commented 1 month ago

I have a PR #264 started with just the simplest change I could make, which adds the <instrs> cell to the rules in mandos.md. This change alone cut the decision tree folder by about 300MB and the kompile time by ~3.5 minutes. Unfortunately, I haven't been able to run the tests locally to verify if the change doesn't break anything, I can't seem to get the proper version of the rust tools to get the prerequisites to compile without some error.

More changes should be able to be made with the elrond semantics for the <commands> cell. However, the changes that need to be made aren't as obvious here to me because they can depend on the <instrs> cell to either be empty, or to have #waitCommands at the top, and I'm not sure which rules are which.

I believe it's possible to add some custom compiler pipeline steps with pyk by using the --post-process option for kompile. These pipeline steps could add <commands> .K </commands> or <instrs> .K #Or #waitCommands </instrs> to the appropriate rules so they don't need to be manually added. I think it could even be possible to add the <k> cell to the configurations and rules in elrond and wasm with these transformations.

Baltoli commented 1 month ago

Closing this for now with #265; really great work @gtrepta and @bbyalcinkaya!

runtimeverification / mx-semantics

Optimise decision trees #259