Open kerwin6182828 opened 4 years ago
Are you sure you're using version 1.0.3 and that that's all you've run?
I just tried this on v1.0.3 and got
In [3]: import pandas as pd
In [4]: df = pd.DataFrame({"id":[0, 1, 2], "name":["a", None, None]})
In [5]: df.query("name.isnull()")
Out[5]:
id name
1 1 None
2 2 None
On master I see the TypeError
. @kerwin6182828 can you post the full traceback?
Ah yes, it reproduces on master
traceback:
>>> df.query("name.isnull()")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/marco/pandas-dev/pandas/core/frame.py", line 3269, in query
res = self.eval(expr, **kwargs)
File "/home/marco/pandas-dev/pandas/core/frame.py", line 3399, in eval
return _eval(expr, inplace=inplace, **kwargs)
File "/home/marco/pandas-dev/pandas/core/computation/eval.py", line 346, in eval
ret = eng_inst.evaluate()
File "/home/marco/pandas-dev/pandas/core/computation/engines.py", line 73, in evaluate
res = self._evaluate()
File "/home/marco/pandas-dev/pandas/core/computation/engines.py", line 113, in _evaluate
_check_ne_builtin_clash(self.expr)
File "/home/marco/pandas-dev/pandas/core/computation/engines.py", line 29, in _check_ne_builtin_clash
names = expr.names
File "/home/marco/pandas-dev/pandas/core/computation/expr.py", line 814, in names
return frozenset([self.terms.name])
File "/home/marco/pandas-dev/pandas/core/generic.py", line 1692, in __hash__
f"{repr(type(self).__name__)} objects are mutable, "
TypeError: 'Series' objects are mutable, thus they cannot be hashed
On master I see the
TypeError
. @kerwin6182828 can you post the full traceback?
i had many traceback that i can't understand 😂. so, please give me a hand~
In [4]: df = pd.DataFrame({"id":[0, 1, 2], "name":["a", None, None]})
...: df.query("name.isnull()")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-d2b97bb7aa5e> in <module>
1 df = pd.DataFrame({"id":[0, 1, 2], "name":["a", None, None]})
----> 2 df.query("name.isnull()")
/usr/local/lib/python3.7/site-packages/pandas/core/frame.py in query(self, expr, inplace, **kwargs)
3229 kwargs["level"] = kwargs.pop("level", 0) + 1
3230 kwargs["target"] = None
-> 3231 res = self.eval(expr, **kwargs)
3232
3233 try:
/usr/local/lib/python3.7/site-packages/pandas/core/frame.py in eval(self, expr, inplace, **kwargs)
3344 kwargs["resolvers"] = kwargs.get("resolvers", ()) + tuple(resolvers)
3345
-> 3346 return _eval(expr, inplace=inplace, **kwargs)
3347
3348 def select_dtypes(self, include=None, exclude=None) -> "DataFrame":
/usr/local/lib/python3.7/site-packages/pandas/core/computation/eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)
335 eng = _engines[engine]
336 eng_inst = eng(parsed_expr)
--> 337 ret = eng_inst.evaluate()
338
339 if parsed_expr.assigner is None:
/usr/local/lib/python3.7/site-packages/pandas/core/computation/engines.py in evaluate(self)
71
72 # make sure no names in resolvers and locals/globals clash
---> 73 res = self._evaluate()
74 return reconstruct_object(
75 self.result_type, res, self.aligned_axes, self.expr.terms.return_type
/usr/local/lib/python3.7/site-packages/pandas/core/computation/engines.py in _evaluate(self)
111 env = self.expr.env
112 scope = env.full_scope
--> 113 _check_ne_builtin_clash(self.expr)
114 return ne.evaluate(s, local_dict=scope)
115
/usr/local/lib/python3.7/site-packages/pandas/core/computation/engines.py in _check_ne_builtin_clash(expr)
27 Terms can contain
28 """
---> 29 names = expr.names
30 overlap = names & _ne_builtins
31
/usr/local/lib/python3.7/site-packages/pandas/core/computation/expr.py in names(self)
785 """Get the names in an expression"""
786 if is_term(self.terms):
--> 787 return frozenset([self.terms.name])
788 return frozenset(term.name for term in com.flatten(self.terms))
789
/usr/local/lib/python3.7/site-packages/pandas/core/generic.py in __hash__(self)
1797 def __hash__(self):
1798 raise TypeError(
-> 1799 f"{repr(type(self).__name__)} objects are mutable, "
1800 f"thus they cannot be hashed"
1801 )
TypeError: 'Series' objects are mutable, thus they cannot be hashed
On master I see the
TypeError
. @kerwin6182828 can you post the full traceback?
or could i get a alternative solution that can get the NAN data with query method ? now, i use df.query("name != name") instead. But i don't know is there any other better solution?
I think you can try by using python engine instead of numexpr engine
df.query("name.isnull()", engine='python')
But it can be less efficient
When we call a function in query or eval function, with numexpr as engine, we get the error
TypeError: 'Series' objects are mutable, thus they cannot be hashed
Do you think this error should be solve in pandas or numexpr ?
I have a similar error for a file I am trying to read as excel. Please find the full traceback to the error below and the version of Pandas used is 1.0.3
Traceback (most recent call last): File "/Users/salonishah/Desktop/Python Programs/Sanmina data manipulation.py", line 6, in
print(df.loc((df['Part Description'] == 'MC PCB'))) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexing.py", line 578, in call axis = self.obj._get_axis_number(axis) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py", line 398, in _get_axis_number axis = cls._AXIS_ALIASES.get(axis, axis) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py", line 1798, in hash raise TypeError( TypeError: 'Series' objects are mutable, thus they cannot be hashed
I was working on a udacity course, the goal was to detect duplicates in a dataframe and the same error has arisen by calling df.duplicated()
Is there any update on fixing this? Running queries on text data is a very common use-case.
For my case, I want to make a contains
query and got this problem immediately. Found out that one needs to specify the NaN handling inside the query as well, otherwise the python engine complains about it. E.g. for a table containing NaNs the two commands below should have the same output --
df = pd.read_csv('test.csv', dtype=str, na_values='')
print(df[df['Column_A'].str.contains('abc', na=False)])
print(df.query("Column_A.str.contains('abc', na=False)", engine='python'))
For the poeple wondering why this bug appears for them and not for others (or vice versa):
If you do not have the numexpr
package installed, then python will be automatically used as engine. Thus, you do not need to specify engine='python'
in the query method.
Consequently, one quick fix would be to uninstall the numexpr
package. This may introduce other issues for you but it worked for me as I didn't really use numexpr
for anything.
This is annoying although it's nice that specifying engine='python' works around the problem. I don't really understand why this is now a problem - this seems like a very natural use case. Please fix! Thanks.
Please fix! Thanks.
Please refer to the contributing guide if you're interested in submitting a fix, else I'm afraid you'll after to wait for someone else to submit one
I already noticed this instability in the past: #30005
[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandas.
[x] (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
i'm so sad about this error, cause i do this 'query' method a lot of times. But today, this error happend suddenly, i really don't know what happend with my mac. i just installed mars library which relative with pandas, and nothing else. please help me for this problem, so appreciate!!
Expected Output
id name 1 1 None 2 2 None
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None python : 3.7.6.final.0 python-bits : 64 OS : Darwin OS-release : 19.4.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : zh_CN.UTF-8 LOCALE : zh_CN.UTF-8
pandas : 1.0.3 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.1 setuptools : 46.1.3 Cython : None pytest : 5.0.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.5.0 html5lib : None pymysql : 0.9.3 psycopg2 : None jinja2 : 2.10.1 IPython : 7.13.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.5.0 matplotlib : 3.1.2 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.0 pandas_gbq : None pyarrow : 0.17.0 pytables : None pytest : 5.0.1 pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : 1.3.13 tables : None tabulate : 0.8.7 xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : None numba : 0.48.0