pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.62k stars 17.57k forks source link

BUG: eval fails to process expression when one column name starts with a digit or some special characters #59043

Closed Reimarleo closed 2 weeks ago

Reimarleo commented 2 weeks ago

Pandas version checks

Reproducible Example

import pandas as pd

data = {'A':[True,False,True],'1B':[True,True,False]}
df = pd.DataFrame(data)

expr = 'A & 1B'
result = df.eval(expr)

Issue Description

The example above results in this error: A +1 B ^ SyntaxError: invalid syntax

Expected Behavior

Prepending an underscore to the column name in both the dataframe and in the expression fixes the problem. Since pandas allows column names to start with a digit, the .eval function should be able to process expressions with those columns.

import pandas as pd

data = {'A':[True,False,True],'_1B':[True,True,False]}
df = pd.DataFrame(data)

expr = 'A & _1B'
result = df.eval(expr)

print(result)

0 True 1 False 2 False dtype: bool

Installed Versions

INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.12.4.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19045 machine : AMD64 processor : Intel64 Family 6 Model 154 Stepping 4, GenuineIntel byteorder : little LC_ALL : None LANG : en LOCALE : English_United Kingdom.1252 pandas : 2.2.2 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : None pip : 24.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 3.2.0 lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.25.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 16.1.0 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.13.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None
asishm commented 2 weeks ago

from pd.query docs:

You can refer to column names that are not valid Python variable names by surrounding them in backticks.

In this case, 1B is not a valid python variable name (python variables can't start with digits).

df.eval("A & `1B`")