rlwhitcomb / utilities

Some of my personal utility programs
MIT License
2 stars 0 forks source link

Add an optimizer step to Calc parsing to eliminate empty levels of the parse tree. #638

Open rlwhitcomb opened 6 months ago

rlwhitcomb commented 6 months ago

Often for the intermediate semantic constructs the children of the element is only one lower-level element, leading down to the actual terminal nodes with the real text. Sometimes we need the intermediate elements for (for instance) "getTreeText" to establish context, but other times they are just essentially useless constructs created during parsing that only slow down execution. So, it might make sense and save execution time to optimize out these essentially empty layers before execution time.

This is all assuming I can actually rebuild the tree using the public API of the ANTLR runtime ...

The first pretty obvious step is (maybe) to eliminate the lowest level TerminalNodeImpl nodes (which I think currently are the only place the actual text is stored, but which we don't need for evaluation, except for the text), so if the next higher levels can contain the text, but without any children, that should make things a lot faster.

rlwhitcomb commented 6 months ago

Of course, it may be necessary to build a completely new tree of our own classes to do this; that is, only keep the necessary nodes for context in interpretation, eliminating the lowest-level terminal nodes, and eliminating the intermediate parser nodes that don't help in our final evaluation. That would, of course, have to completely rewrite CalcObjectVisitor to do this job, and make a new class that interprets our final output tree, keeping the relevant calculations, of course.

That would open us up to all kinds of other possible optimizations, like constant expression evaluation, common subtree elimination, inlining of small functions, etc., most of which are beyond my tiny comprehension of compiler optimization techniques .... LOL

rlwhitcomb commented 6 months ago

Okay, let me see if I can enumerate the constant values, which are candidates for constant expression evaluation:

rlwhitcomb commented 6 months ago

Here is a sample of the current parse tree that seems like it could be simplified (a lot) with this algorithm:

define n(x) = { if x > 5 "big" else "small" }
prog: definen(x)={ifx>5"big"else"small"}<EOF>
   stmtOrExpr: definen(x)={ifx>5"big"else"small"}
      defineStmt: definen(x)={ifx>5"big"else"small"}
         text: define
         id: n
            text: n
         formalParamList: (x)
            text: (
            formalParam: x
               id: x
                  text: x
            text: )
         text: =
         stmtBlock: {ifx>5"big"else"small"}
            text: {
            stmtOrExpr: ifx>5"big"else"small"
               ifStmt: ifx>5"big"else"small"
                  text: if
                  compareExpr: x>5
                     varExpr: x
                        idVar: x
                           id: x
                              text: x
                     text: >
                     valueExpr: 5
                        numberValue: 5
                           text: 5
                  stmtBlock: "big"
                     stmtOrExpr: "big"
                        exprStmt: "big"
                           valueExpr: "big"
                              stringValue: "big"
                                 text: "big"
                  text: else
                  stmtBlock: "small"
                     stmtOrExpr: "small"
                        exprStmt: "small"
                           valueExpr: "small"
                              stringValue: "small"
                                 text: "small"
            text: }
   text: <EOF>

Notice how all the "text" nodes could (at least) be eliminated if we could add the text to the node above.