pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.87k stars 18.02k forks source link

BUG (string dtype): logical operation with bool and string failing #60234

Open jorisvandenbossche opened 2 weeks ago

jorisvandenbossche commented 2 weeks ago

We do allow using logical operators like | to be used with non-boolean data (at which point the non-bool series would be cast to bool, I assume). For example:

>>> ser1 = pd.Series([False, False])
>>> ser2 = pd.Series([0.0, 0.1])
>>> ser1 | ser2
0    False
1     True
dtype: bool

This also worked with strings in object dtype:

>>> ser2 = pd.Series(["", "b"], dtype=object)
>>>  ser1 | ser2
0    False
1     True
dtype: bool

but currently fails with the pyarrow-backed string dtype:

>>> pd.options.future.infer_string = True
>>> ser2 = pd.Series(["", "b"])
>>> ser1 | ser2
...

File ~/scipy/repos/pandas/pandas/core/arrays/arrow/array.py:833, in ArrowExtensionArray._logical_method(self, other, op)
    831     return self._evaluate_op_method(other, op, ARROW_BIT_WISE_FUNCS)
    832 else:
--> 833     return self._evaluate_op_method(other, op, ARROW_LOGICAL_FUNCS)

File ~/scipy/repos/pandas/pandas/core/arrays/arrow/array.py:824, in ArrowExtensionArray._evaluate_op_method(self, other, op, arrow_funcs)
    822     result = pc_func(self._pa_array, other)
    823 except pa.ArrowNotImplementedError as err:
--> 824     raise TypeError(self._op_method_error_message(other_original, op)) from err
    825 return type(self)(result)

TypeError: operation 'ror_' not supported for dtype 'str' with dtype 'bool'
simonjayhawkins commented 1 week ago

but currently fails with the pyarrow-backed string dtype:

also fails with the numpy backed string dtype:

>>> ser2 = pd.Series(["", "b"], dtype="string[python]")
>>> ser2
0     
1    b
dtype: string
>>> 
>>> ser2.dtype.storage
'python'
>>> 
>>> ser1 | ser2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/simon/pandas/pandas/core/ops/common.py", line 76, in new_method
    return method(self, other)
  File "/home/simon/pandas/pandas/core/arraylike.py", line 79, in __or__
    return self._logical_method(other, operator.or_)
  File "/home/simon/pandas/pandas/core/series.py", line 5881, in _logical_method
    res_values = ops.logical_op(lvalues, rvalues, op)
  File "/home/simon/pandas/pandas/core/ops/array_ops.py", line 439, in logical_op
    res_values = op(lvalues, rvalues)
  File "/home/simon/pandas/pandas/core/arrays/numpy_.py", line 193, in __array_ufunc__
    result = getattr(ufunc, method)(*inputs, **kwargs)
TypeError: unsupported operand type(s) for |: 'bool' and 'str'
>>> 
ldlin1 commented 5 days ago

take

ldlin1 commented 3 days ago

Hi @jorisvandenbossche, the bot doesn't seem to be working for me, would it possible for you to manually assign this issue to me?

jorisvandenbossche commented 3 days ago

It seems to have worked now