Add an optimizer step to Calc parsing to eliminate empty levels of the parse tree.

rlwhitcomb commented 6 months ago

Often for the intermediate semantic constructs the children of the element is only one lower-level element, leading down to the actual terminal nodes with the real text. Sometimes we need the intermediate elements for (for instance) "getTreeText" to establish context, but other times they are just essentially useless constructs created during parsing that only slow down execution. So, it might make sense and save execution time to optimize out these essentially empty layers before execution time.

This is all assuming I can actually rebuild the tree using the public API of the ANTLR runtime ...

The first pretty obvious step is (maybe) to eliminate the lowest level TerminalNodeImpl nodes (which I think currently are the only place the actual text is stored, but which we don't need for evaluation, except for the text), so if the next higher levels can contain the text, but without any children, that should make things a lot faster.

rlwhitcomb commented 6 months ago

Of course, it may be necessary to build a completely new tree of our own classes to do this; that is, only keep the necessary nodes for context in interpretation, eliminating the lowest-level terminal nodes, and eliminating the intermediate parser nodes that don't help in our final evaluation. That would, of course, have to completely rewrite CalcObjectVisitor to do this job, and make a new class that interprets our final output tree, keeping the relevant calculations, of course.

That would open us up to all kinds of other possible optimizations, like constant expression evaluation, common subtree elimination, inlining of small functions, etc., most of which are beyond my tiny comprehension of compiler optimization techniques .... LOL

rlwhitcomb commented 6 months ago

Okay, let me see if I can enumerate the constant values, which are candidates for constant expression evaluation:

Numeric constants of all kinds (including Roman numerals, and our built-in constants like pi, e, i, etc.)
BUT, take care on "pi" and "e" especially that their values depend on the precision, so they are NOT EXACTLY constant for all time, but could be within the context of a certain precision (so we might have to keep track of that at a global level while optimizing, since the precision could be set by a variable). We would have to keep track of pi(36) (for instance) as in "pi to 36 digit precision" as the constant value.
Regular string constants (that is NOT interpolated strings)
Interpolated strings with no variable references or embedded expressions
Any other ConstantValue, PredefinedValue types not covered above (like info)
The command-line inputs ($*, $0, etc.)
Maps, Sets, of Lists of all constant values or constant expressions
Which includes the empty objects There may be others ....

rlwhitcomb commented 6 months ago

Here is a sample of the current parse tree that seems like it could be simplified (a lot) with this algorithm:

define n(x) = { if x > 5 "big" else "small" }
prog: definen(x)={ifx>5"big"else"small"}<EOF>
   stmtOrExpr: definen(x)={ifx>5"big"else"small"}
      defineStmt: definen(x)={ifx>5"big"else"small"}
         text: define
         id: n
            text: n
         formalParamList: (x)
            text: (
            formalParam: x
               id: x
                  text: x
            text: )
         text: =
         stmtBlock: {ifx>5"big"else"small"}
            text: {
            stmtOrExpr: ifx>5"big"else"small"
               ifStmt: ifx>5"big"else"small"
                  text: if
                  compareExpr: x>5
                     varExpr: x
                        idVar: x
                           id: x
                              text: x
                     text: >
                     valueExpr: 5
                        numberValue: 5
                           text: 5
                  stmtBlock: "big"
                     stmtOrExpr: "big"
                        exprStmt: "big"
                           valueExpr: "big"
                              stringValue: "big"
                                 text: "big"
                  text: else
                  stmtBlock: "small"
                     stmtOrExpr: "small"
                        exprStmt: "small"
                           valueExpr: "small"
                              stringValue: "small"
                                 text: "small"
            text: }
   text: <EOF>

Notice how all the "text" nodes could (at least) be eliminated if we could add the text to the node above.

rlwhitcomb / utilities

Add an optimizer step to Calc parsing to eliminate empty levels of the parse tree. #638