pydata / patsy

Describing statistical models in Python using symbolic formulas
Other
954 stars 104 forks source link

inf treated as a number, which is very weird #118

Open njsmith opened 6 years ago

njsmith commented 6 years ago

As noted here: https://stackoverflow.com/questions/48371747/how-to-modify-a-liner-regression-in-python-3-6

This formula causes patsy to raise an error:

patsy.ModelDesc.from_formula("inf ~ x")

the problem is that in patsy.parse_formula._read_python_expr, patsy tries to figure out whether an arbitrary Python expression is a numeric literal, and the way it does this is by calling int(...) and float(...) on the expression, and seeing if they work.

In this case, float("inf") does work, so patsy decides that the Python expression inf is a numeric literal. Whoops.

The same thing probably happens if you try to use nan as a variable name in a formula.

I guess a more reliable way of checking for numeric literals would be to check the tokenize output: if an expression is a single token, and that token has type tokenize.NUMBER, then it's a numeric literal.