tonybaloney / perflint

Python Linter for performance anti patterns
MIT License
659 stars 10 forks source link

Performance gains via reordering of mathematical expressions #50

Open mqyhlkahu opened 7 months ago

mqyhlkahu commented 7 months ago

Optimisation Opportunity

The generated bytecode from the expression 1 + a + 1 is inefficient:

              2 LOAD_CONST               0 (1)
              4 LOAD_NAME                0 (a)
              6 BINARY_OP                0 (+)
             10 LOAD_CONST               0 (1)
             12 BINARY_OP                0 (+)

It would be more efficient if the expression was manually const-folded or reordered so that Python all the constants are first, allowing Python to const-fold it. For example, the expression 1 + 1 + a compiles to:

              2 LOAD_CONST               0 (2)
              4 LOAD_NAME                0 (a)
              6 BINARY_OP                0 (+)

As you can see, the 1 + 1 has been const-folded, meaning that one less addition has to be performed.

A Caveat

A Python implementation cannot perform an optimisation like this, because a could have a .__add__ method that is not associative and commutative, meaning that changes like these would result in different behaviour.

However, a linter could freely suggest optimisations of this nature, because they're just suggestions. The user would have to be properly warned that the optimisation should only be applied to ints, however.

What Specifically can be Optimised

Expressions containing commutative and associative operations on ints can be reordered so that constants factors are first. This cannot be done for floats, because floating point operations are neither associative nor commutative (this is why modern compilers don't optimise floating point math by default). However, you could add an option to lint floating point operations in this way as well (e.g., gcc only performs optimisations like this when -fassociative-math is enabled).

Realistically, this would be rather difficult to implement, and the performance gains would not be very large compared to some other possibilities. But I figured I'd post the issue anyway.