Open jorisvandenbossche opened 2 hours ago
At the moment you can freely compare with mixed object dtype column:
>>> ser_string = pd.Series(["a", "b"]) >>> ser_mixed = pd.Series([1, "b"]) >>> ser_string == ser_mixed 0 False 1 True dtype: bool
But with the string dtype enabled (using pyarrow), this now raises an error:
>>> pd.options.future.infer_string = True >>> ser_string = pd.Series(["a", "b"]) >>> ser_mixed = pd.Series([1, "b"]) >>> ser_string == ser_mixed ... File ~/scipy/repos/pandas/pandas/core/arrays/arrow/array.py:510, in ArrowExtensionArray._box_pa_array(cls, value, pa_type, copy) ... --> 510 pa_array = pa.array(value, from_pandas=True) ... ArrowInvalid: Could not convert 'b' with type str: tried to convert to int64
This happens because the ArrowEA tries to convert the other operand to Arrow as well, which fails for mixed types.
other
In general, I think our rule is that == comparison never fails, but then just gives False for when values are not comparable.
==
It seems we actually have a comment in the code about this issue in case of object dtype:
https://github.com/pandas-dev/pandas/blob/692ea6f9d4b05187a05f0811d3241211855d6efb/pandas/core/arrays/arrow/array.py#L728-L734
At the moment you can freely compare with mixed object dtype column:
But with the string dtype enabled (using pyarrow), this now raises an error:
This happens because the ArrowEA tries to convert the
other
operand to Arrow as well, which fails for mixed types.In general, I think our rule is that
==
comparison never fails, but then just gives False for when values are not comparable.