pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.91k stars 18.03k forks source link

BUG: 'Series' objects are mutable, thus they cannot be hashed #34251

Open kerwin6182828 opened 4 years ago

kerwin6182828 commented 4 years ago

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
df = pd.DataFrame({"id":[0, 1, 2], "name":["a", None, None]})
df.query("name.isnull()")

# output:
TypeError: 'Series' objects are mutable, thus they cannot be hashed

Problem description

i'm so sad about this error, cause i do this 'query' method a lot of times. But today, this error happend suddenly, i really don't know what happend with my mac. i just installed mars library which relative with pandas, and nothing else. please help me for this problem, so appreciate!!

Expected Output

id name 1 1 None 2 2 None

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None python : 3.7.6.final.0 python-bits : 64 OS : Darwin OS-release : 19.4.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : zh_CN.UTF-8 LOCALE : zh_CN.UTF-8

pandas : 1.0.3 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.1 setuptools : 46.1.3 Cython : None pytest : 5.0.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.5.0 html5lib : None pymysql : 0.9.3 psycopg2 : None jinja2 : 2.10.1 IPython : 7.13.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.5.0 matplotlib : 3.1.2 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.0 pandas_gbq : None pyarrow : 0.17.0 pytables : None pytest : 5.0.1 pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : 1.3.13 tables : None tabulate : 0.8.7 xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : None numba : 0.48.0

[TypeError: 'Series' objects are mutable, thus they cannot be hashed]
MarcoGorelli commented 4 years ago

Are you sure you're using version 1.0.3 and that that's all you've run?

I just tried this on v1.0.3 and got

In [3]: import pandas as pd

In [4]: df = pd.DataFrame({"id":[0, 1, 2], "name":["a", None, None]})

In [5]: df.query("name.isnull()")
Out[5]:
   id  name
1   1  None
2   2  None
TomAugspurger commented 4 years ago

On master I see the TypeError. @kerwin6182828 can you post the full traceback?

MarcoGorelli commented 4 years ago

Ah yes, it reproduces on master

traceback:

>>> df.query("name.isnull()")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/marco/pandas-dev/pandas/core/frame.py", line 3269, in query
    res = self.eval(expr, **kwargs)
  File "/home/marco/pandas-dev/pandas/core/frame.py", line 3399, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/home/marco/pandas-dev/pandas/core/computation/eval.py", line 346, in eval
    ret = eng_inst.evaluate()
  File "/home/marco/pandas-dev/pandas/core/computation/engines.py", line 73, in evaluate
    res = self._evaluate()
  File "/home/marco/pandas-dev/pandas/core/computation/engines.py", line 113, in _evaluate
    _check_ne_builtin_clash(self.expr)
  File "/home/marco/pandas-dev/pandas/core/computation/engines.py", line 29, in _check_ne_builtin_clash
    names = expr.names
  File "/home/marco/pandas-dev/pandas/core/computation/expr.py", line 814, in names
    return frozenset([self.terms.name])
  File "/home/marco/pandas-dev/pandas/core/generic.py", line 1692, in __hash__
    f"{repr(type(self).__name__)} objects are mutable, "
TypeError: 'Series' objects are mutable, thus they cannot be hashed
kerwin6182828 commented 4 years ago

On master I see the TypeError. @kerwin6182828 can you post the full traceback?

i had many traceback that i can't understand 😂. so, please give me a hand~

In [4]: df = pd.DataFrame({"id":[0, 1, 2], "name":["a", None, None]})
   ...: df.query("name.isnull()")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-d2b97bb7aa5e> in <module>
      1 df = pd.DataFrame({"id":[0, 1, 2], "name":["a", None, None]})
----> 2 df.query("name.isnull()")

/usr/local/lib/python3.7/site-packages/pandas/core/frame.py in query(self, expr, inplace, **kwargs)
   3229         kwargs["level"] = kwargs.pop("level", 0) + 1
   3230         kwargs["target"] = None
-> 3231         res = self.eval(expr, **kwargs)
   3232
   3233         try:

/usr/local/lib/python3.7/site-packages/pandas/core/frame.py in eval(self, expr, inplace, **kwargs)
   3344         kwargs["resolvers"] = kwargs.get("resolvers", ()) + tuple(resolvers)
   3345
-> 3346         return _eval(expr, inplace=inplace, **kwargs)
   3347
   3348     def select_dtypes(self, include=None, exclude=None) -> "DataFrame":

/usr/local/lib/python3.7/site-packages/pandas/core/computation/eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)
    335         eng = _engines[engine]
    336         eng_inst = eng(parsed_expr)
--> 337         ret = eng_inst.evaluate()
    338
    339         if parsed_expr.assigner is None:

/usr/local/lib/python3.7/site-packages/pandas/core/computation/engines.py in evaluate(self)
     71
     72         # make sure no names in resolvers and locals/globals clash
---> 73         res = self._evaluate()
     74         return reconstruct_object(
     75             self.result_type, res, self.aligned_axes, self.expr.terms.return_type

/usr/local/lib/python3.7/site-packages/pandas/core/computation/engines.py in _evaluate(self)
    111         env = self.expr.env
    112         scope = env.full_scope
--> 113         _check_ne_builtin_clash(self.expr)
    114         return ne.evaluate(s, local_dict=scope)
    115

/usr/local/lib/python3.7/site-packages/pandas/core/computation/engines.py in _check_ne_builtin_clash(expr)
     27         Terms can contain
     28     """
---> 29     names = expr.names
     30     overlap = names & _ne_builtins
     31

/usr/local/lib/python3.7/site-packages/pandas/core/computation/expr.py in names(self)
    785         """Get the names in an expression"""
    786         if is_term(self.terms):
--> 787             return frozenset([self.terms.name])
    788         return frozenset(term.name for term in com.flatten(self.terms))
    789

/usr/local/lib/python3.7/site-packages/pandas/core/generic.py in __hash__(self)
   1797     def __hash__(self):
   1798         raise TypeError(
-> 1799             f"{repr(type(self).__name__)} objects are mutable, "
   1800             f"thus they cannot be hashed"
   1801         )

TypeError: 'Series' objects are mutable, thus they cannot be hashed
kerwin6182828 commented 4 years ago

On master I see the TypeError. @kerwin6182828 can you post the full traceback?

or could i get a alternative solution that can get the NAN data with query method ? now, i use df.query("name != name") instead. But i don't know is there any other better solution?

Benjamin15 commented 4 years ago

I think you can try by using python engine instead of numexpr engine df.query("name.isnull()", engine='python')

But it can be less efficient

When we call a function in query or eval function, with numexpr as engine, we get the error TypeError: 'Series' objects are mutable, thus they cannot be hashed

Do you think this error should be solve in pandas or numexpr ?

Sal2912 commented 4 years ago

I have a similar error for a file I am trying to read as excel. Please find the full traceback to the error below and the version of Pandas used is 1.0.3

Traceback (most recent call last): File "/Users/salonishah/Desktop/Python Programs/Sanmina data manipulation.py", line 6, in print(df.loc((df['Part Description'] == 'MC PCB'))) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexing.py", line 578, in call axis = self.obj._get_axis_number(axis) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py", line 398, in _get_axis_number axis = cls._AXIS_ALIASES.get(axis, axis) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py", line 1798, in hash raise TypeError( TypeError: 'Series' objects are mutable, thus they cannot be hashed

MohamedMehaya commented 4 years ago

I was working on a udacity course, the goal was to detect duplicates in a dataframe and the same error has arisen by calling df.duplicated() image

swigicat commented 4 years ago

Is there any update on fixing this? Running queries on text data is a very common use-case.

For my case, I want to make a contains query and got this problem immediately. Found out that one needs to specify the NaN handling inside the query as well, otherwise the python engine complains about it. E.g. for a table containing NaNs the two commands below should have the same output --

df = pd.read_csv('test.csv', dtype=str, na_values='')
print(df[df['Column_A'].str.contains('abc', na=False)])
print(df.query("Column_A.str.contains('abc', na=False)", engine='python'))
JohannHansing commented 4 years ago

For the poeple wondering why this bug appears for them and not for others (or vice versa):

If you do not have the numexpr package installed, then python will be automatically used as engine. Thus, you do not need to specify engine='python' in the query method.

Consequently, one quick fix would be to uninstall the numexpr package. This may introduce other issues for you but it worked for me as I didn't really use numexpr for anything.

nhoover commented 3 years ago

This is annoying although it's nice that specifying engine='python' works around the problem. I don't really understand why this is now a problem - this seems like a very natural use case. Please fix! Thanks.

MarcoGorelli commented 3 years ago

Please fix! Thanks.

Please refer to the contributing guide if you're interested in submitting a fix, else I'm afraid you'll after to wait for someone else to submit one

hwalinga commented 3 years ago

I already noticed this instability in the past: #30005