Open AkshayWarrier opened 9 months ago
Great catch! Thanks for figuring out the problem.
There is a design issue to figure out: in Fortran these operators do not short circuit. In C and Python they do. There are also proposals to Fortran both in terms of new operators or compiler options or even changes to the standard to make them short circuit.
In terms of performance, it would be nice to not close door to possible optimizations.
In terms of the bug above, I am guessing the ASR->LLVM evaluates all part of the expression and then uses "and" on the result. To short-circuit it, it should first evaluate the first, and only do the second if it is true.
Probably the easiest for now is to set a flag in ASR for LogicalBinOp
to designate if they should short-circuit, something like:
--- a/src/libasr/ASR.asdl
+++ b/src/libasr/ASR.asdl
@@ -277,7 +277,7 @@ expr
| LogicalConstant(bool value, ttype type)
| LogicalNot(expr arg, ttype type, expr? value)
| LogicalCompare(expr left, cmpop op, expr right, ttype type, expr? value)
- | LogicalBinOp(expr left, logicalbinop op, expr right, ttype type, expr? value)
+ | LogicalBinOp(expr left, logicalbinop op, expr right, bool short_circuit, ttype type, expr? value)
| ListConstant(expr* args, ttype type)
| ListLen(expr arg, ttype type, expr? value)
Then in the backend it would generate the proper code.
When would we not want to short-ciruict? I think never. I think we however want to have the freedom to rearrange the expression sometimes for optimizations. There is also an issue of emitting Fortran code from ASR, if we short-circuit by default, we can't just use .and.
in Fortran directly.
This page has a lot of background on this topic: https://en.wikipedia.org/wiki/Short-circuit_evaluation, it seems we most likely require both, so the above diff is probably the best.
Conceptually it looks like the following will cover all cases:
--- a/src/libasr/ASR.asdl
+++ b/src/libasr/ASR.asdl
@@ -277,7 +277,7 @@ expr
| LogicalConstant(bool value, ttype type)
| LogicalNot(expr arg, ttype type, expr? value)
| LogicalCompare(expr left, cmpop op, expr right, ttype type, expr? value)
- | LogicalBinOp(expr left, logicalbinop op, expr right, ttype type, expr? value)
+ | LogicalBinOp(expr left, logicalbinop op, expr right, logicalevaltype eval_type, ttype type, expr? value)
| ListConstant(expr* args, ttype type)
| ListLen(expr arg, ttype type, expr? value)
@@ -435,6 +435,8 @@ binop = Add | Sub | Mul | Div | Pow | BitAnd | BitOr | BitXor | BitLShift | BitR
logicalbinop = And | Or | Xor | NEqv | Eqv
+logicalevaltype = EvalShortCircuit | EvalEager | EvalUnspecified
+
cmpop = Eq | NotEq | Lt | LtE | Gt | GtE
integerboz = Binary | Hex | Octal
Here the Python and C frontends will use EvalShortCircuit
, the Fortran frontend will use EvalUnspecified
. The EvalEager
is an evaluation strategy that we currently use (we evaluate all arguments first, then apply the boolean operators), and our optimizer or the frontend can choose this if needed.
All three are different. EvalShortCircuit
and EvalEager
all preserve the order of arguments (the first one only evaluates as neeed, the second always evaluates all, in order), while EvalUnspecified
allows any order and it might or might not evaluate arguments.
In terms of what rewriting (optimization) one can do on an expression, all three cases above I think have different semantics. It seems that EvalUnspecified
can be freely converted to either EvalEager
or EvalShortCircuit
and it will still work semantically. But EvalEager
and EvalShortCircuit
can't be converted to any of the other two options, as it will change the semantics. Here is why:
EvalShortCircuit
should have been set, but EvalEager
was used instead.EvalEager
can't be in general changed to EvalShortCircuit
since there could have been side effects from the full eager evaluation of all arguments, that now will be skipped, so the program can change meaning.Finally, our optimizer can analyze an expression, and if all functions/operands are side-effects-free, then I think EvalEager
can be converted to EvalShortCircuit
(order specific) or EvalUnspecified
(order independent).
If operands are side-effects-free and they do not depend on each other (so NOT this issue), then I think EvalShortCircuit
can be converted to EvalEager
or EvalUnspecified
.
This issue still exists. An MRE that fails:
% cat examples/expr2.py
from lpython import i32
a: i32 = 10
def hi() -> bool:
global a
a = a + 1
return True
def main0():
if False and hi():
pass
print(a)
assert a == 10
if __name__ == "__main__":
main0()
% python examples/expr2.py
10
% lpython examples/expr2.py
11
AssertionError
I discovered this when I first ran
and it gave me this error
Looking at the implementation of
_lpython_str_lstrip
I realized that the
and
inwhile ind < len(x) and x[ind] == ' ':
wasn't short circuiting as expected. It becomes more clear when running a minimal exampleThis raises the index out of bounds error, because the second expression is being executed even though the first expression is
False
. Similarly this raises an error as well