lmfit / asteval

minimalistic evaluator of python expression using ast module
https://lmfit.github.io/asteval
MIT License
176 stars 41 forks source link

on_compare not properly handling non-boolean values #131

Closed OliverCWY closed 3 weeks ago

OliverCWY commented 3 weeks ago

In some libraries (such as polars), the __bool__ methods do not raise ValueError (e.g. polars raises TypeError). This causes the try-except block

try:
  if not res:
    break
except ValueError:
  pass

to raise the uncaught TypeError.

Example code snippet that demonstrates the above:

import polars as pl
from asteval import Interpreter

aeval = Interpreter()
aeval("pl.col('a') > 1")

I assume that any exceptions in the try block would come from the __bool__ method and thus it would be safe to catch all types of error?

newville commented 3 weeks ago

@OliverCWY Um, your example never gets to the comparison. It raises a NameError:

  pl.col('a') > 1
NameError: name 'pl' is not defined

Yup: pl is not defined in the Interpreter.

If you still think there is a problem, post actual working code that actually shows the problem, and the full traceback. Spare the conjecture about the cause of any problem until that problem has been identified.

OliverCWY commented 3 weeks ago

Sorry, I forgot to pass the symbol table when modifying my code.

import polars as pl
from asteval import Interpreter

aeval = Interpreter({"pl": pl})
aeval("pl.col('a') > 1")

and the traceback:

   pl.col('a') > 1
TypeError: the truth value of an Expr is ambiguous

Hint: use '&' or '|' to logically combine Expr, not 'and'/'or', and use `x.is_in([y,z])` instead of `x in [y,z]` to check membership.
newville commented 3 weeks ago

@OliverCWY Indeed, from Python:

>>> import polars as pl
>>> if (pl.col('a') > 1): print('Yes')
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../python3.12/site-packages/polars/expr/expr.py", line 152, in __bool__
    raise TypeError(msg)
TypeError: the truth value of an Expr is ambiguous

You probably got here by using a Python standard library function instead of the native expressions API.
Here are some things you might want to try:
- instead of `pl.col('a') and pl.col('b')`, use `pl.col('a') & pl.col('b')`
- instead of `pl.col('a') in [y, z]`, use `pl.col('a').is_in([y, z])`
- instead of `max(pl.col('a'), pl.col('b'))`, use `pl.max_horizontal(pl.col('a'), pl.col('b'))`

Asteval just raises this exception more aggressively (at the first "Compare" instead of at "If"). But if you do (in Python):

>>>(pl.col('a') > 1 ) or (pl.col('b') < 0)

That will raise the same kind of TypeError exception.

It sort of seems like you would want to follow polars advice and use its methods instead of Python standard library.

I do not have much experience with polars, but this seems like a not very effective sales pitch ;). Like, it has a top-level function called col(), and col('a') is supposed to be comparable to an integer, only sometimes that is going to not be comparable??

What would you expect to happen?

OliverCWY commented 3 weeks ago

@OliverCWY Indeed, from Python:

>>> import polars as pl
>>> if (pl.col('a') > 1): print('Yes')
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../python3.12/site-packages/polars/expr/expr.py", line 152, in __bool__
    raise TypeError(msg)
TypeError: the truth value of an Expr is ambiguous

You probably got here by using a Python standard library function instead of the native expressions API.
Here are some things you might want to try:
- instead of `pl.col('a') and pl.col('b')`, use `pl.col('a') & pl.col('b')`
- instead of `pl.col('a') in [y, z]`, use `pl.col('a').is_in([y, z])`
- instead of `max(pl.col('a'), pl.col('b'))`, use `pl.max_horizontal(pl.col('a'), pl.col('b'))`

Asteval just raises this exception more aggressively (at the first "Compare" instead of at "If"). But if you do (in Python):

>>>(pl.col('a') > 1 ) or (pl.col('b') < 0)

That will raise the same kind of TypeError exception.

It sort of seems like you would want to follow polars advice and use its methods instead of Python standard library.

I do not have much experience with polars, but this seems like a not very effective sales pitch ;). Like, it has a top-level function called col(), and col('a') is supposed to be comparable to an integer, only sometimes that is going to not be comparable??

What would you expect to happen?

Apologies for not explaining the use case. If you simply run pl.col('a') < 1 instead of testing its truth value, you will get a polars expression which can then be used to filter the dataframe.

Following the previous snippet:

expr = pl.col('a') > 1          # works fine
expr = aeval("pl.col('a') > 1") # fails
newville commented 3 weeks ago

@OliverCWY Thanks -- that helps.

Yeah, we do use a special case there that maybe should be relaxed. As with this example (but others, notably numpy), x > y does not necessary return a bool or even a bool-like value.

The challenge is that Comparisons may have multiple operators: x > y > z results in one Comparison with multiple operator/values. In that case, you'd like to return False or raise an exception as early as possible.

And indeed,

>>> import polars as pl
>>> pl.col('a') < 10  > 2

raises the same TypeError exception. The result of pl.col('a') < 10 cannot be compared to 2.

A similar case is

>>> import numpy as np
>>> np.arange(10) > 7
array([False, False, False, False, False, False, False, False,  True,
        True])
>>> np.arange(10) > 4 < 9
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

anyway, I think we can fix this so it better matches Python behavior.

OliverCWY commented 3 weeks ago

@newville Yes, I have read the source codes and understand the reasoning. I think in the try-except block, any error would come from converting res to bool, so it would be safe to simply catch all exceptions rather than ValueError which is only specific to numpy.

newville commented 3 weeks ago

@OliverCWY Yeah, I agree with that. And maybe for the case of a single comparison, we should just return the result. without testing "true-ness" That would still fail on the "If" and behave more like Python. Looking into it...

newville commented 3 weeks ago

@OliverCWY OK, I think this should be fixed (that is, "match Python") in the master branch with 7e2050de1d66f51abd0342711f113ac97c1974b1

OliverCWY commented 3 weeks ago

Thanks a lot for this great project. I will close this issue.