zhouqingqing / qpmodel

A Relational Optimizer and Executor
MIT License
64 stars 18 forks source link

expression normalization #206

Closed pkommoju closed 3 years ago

pkommoju commented 3 years ago

Semantic Normalization and Transformation changes to produce a canonical form of the statement and eliminate unnecessary operation.

The module is called Normalizer and it does the following transformations: 1) Constant move. Bring all possible constants together so that later transformations can simplify or even remove some of the constants. Ex. CONST + expr => expr + CONST

2) Constant folding. Replace expressions involving constants with the value of that part of the expression. This includes some aggregates and functions too. Ex. CONST op CONST => the value after performing the operation.

3) The Arithmetic Simplification. Slightly specialized for of constant folding. Ex. expr + 0, expr - 0, expr * 1, expr / 1 are replaced by expr.

4) Comparison Simplification. Like constant folding but or comparison. It covers more than expr > const, expr = expr being replaced by TRUE or FALSE. Some of these transformations are not done yet.

5) Logical operation simplification, including logical operations involving expression know to be TRUE or FALSE. The biggest thing not done involves NOT. NOT (col > 10) should be replaced by col <= 10.

6) IN list simplification, DISTINCT elimination, redundant CAST elimination, CASE simplification are not done.

All classes derived from Expr may override Normalize() method to customize normalization. Normalizer runs after Binder. There is a new method Expr.BindAndNormalize() which calls bind, and then normalizer.

Not all transformations are attempted, some should not be done at all ( const - col is not the same as col - const, and transforming division is avoided because of he chance of loss of precision).

All classes affected by Normalizer are redefined as partial and their Normalize() methods put into Normalizer.cs.

TODO 1) CASE: case handling is incorrect, so that needs to be fixed first. 2) logical simplification is not thorough. For instance, where c1 <> c1 should be reduced to WHERE FLSE, and, where (a1 = a2) AND (a3 = a3) should be eliminated. 3) CAST elimination. 4) DISTINCT elimination: min(distinct 10) is simply 10 but some optimizer changes may also be required for this to happen.

zhouqingqing commented 3 years ago

please take a look at coverage/coveralls - we see -0.6% down. Once source is from Normalizer.cs - please see if you can add more test coverage there.