pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.19k stars 17.77k forks source link

BUG: Error on query function when the column name has # symbol #59285

Closed yangjlx closed 2 weeks ago

yangjlx commented 1 month ago

Pandas version checks

Reproducible Example

import pandas as pd

df = pd.DataFrame((1,2,3), columns=['a#'])
df.query('a# > 2')

-------------------------------
KeyError                                  Traceback (most recent call last)
File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\computation\scope.py:231, in Scope.resolve(self, key, is_local)
    230 if self.has_resolvers:
--> 231     return self.resolvers[key]
    233 # if we're here that means that we have no locals and we also have
    234 # no resolvers

File d:\Applications\Python\Python311\Lib\collections\__init__.py:1006, in ChainMap.__getitem__(self, key)
   1005         pass
-> 1006 return self.__missing__(key)

File d:\Applications\Python\Python311\Lib\collections\__init__.py:998, in ChainMap.__missing__(self, key)
    997 def __missing__(self, key):
--> 998     raise KeyError(key)

KeyError: 'a'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\computation\scope.py:242, in Scope.resolve(self, key, is_local)
    238 try:
    239     # last ditch effort we look in temporaries
    240     # these are created when parsing indexing expressions
...
    242     return self.temps[key]
    243 except KeyError as err:
--> 244     raise UndefinedVariableError(key, is_local) from err

UndefinedVariableError: name 'a' is not defined

Issue Description

The query function seems to treat symbol # as a comment, it did not work as expected.

I also try to execute

df.query('`a#` > 2')

it still throws an exception

Traceback (most recent call last):

  File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\computation\parsing.py:192 in tokenize_string
    yield tokenize_backtick_quoted_string(

  File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\computation\parsing.py:167 in tokenize_backtick_quoted_string
    return BACKTICK_QUOTED_STRING, source[string_start:string_end]

UnboundLocalError: cannot access local variable 'string_end' where it is not associated with a value

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File d:\Applications\Python\Python311\Lib\site-packages\IPython\core\interactiveshell.py:3553 in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  Cell In[59],   [line 1](vscode-notebook-cell:?execution_count=59&line=1)
    df.query('`a#` > 2')

  File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\frame.py:4823 in query
    res = self.eval(expr, **kwargs)

  File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\frame.py:4949 in eval
...
    raise SyntaxError(f"Failed to parse backticks in '{source}'.") from err

  File <string>
SyntaxError: Failed to parse backticks in '`a#` > 2'.

Expected Behavior

like df[df['a#'] > 2]

Installed Versions

INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.11.6.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22631 machine : AMD64 processor : Intel64 Family 6 Model 183 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : Chinese (Simplified)_China.936 pandas : 2.2.2 numpy : 1.26.3 pytz : 2023.3.post1 dateutil : 2.8.2 setuptools : 65.5.0 pip : 24.0 Cython : 3.0.8 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.3 IPython : 8.20.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.2 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.8.2 numba : None numexpr : None odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.11.4 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.4 qtpy : None pyqt5 : None
aram-cinnamon commented 1 month ago

take