Open WillAyd opened 3 weeks ago
Quickly checking, if I call to_string()
explicitly, it does error:
In [12]: ser.to_string()
...
File ~/scipy/repos/pandas/pandas/core/arrays/arrow/array.py:1458, in ArrowExtensionArray.to_numpy(self, dtype, copy, na_value)
1456 mask = data.isna()
1457 result[mask] = na_value
-> 1458 result[~mask] = data[~mask]._pa_array.to_numpy()
1459 return result
File ~/scipy/repos/pandas/pandas/core/arrays/arrow/array.py:591, in ArrowExtensionArray.__getitem__(self, item)
589 return self.take(item)
590 elif item.dtype.kind == "b":
--> 591 return type(self)(self._pa_array.filter(item))
592 else:
593 raise IndexError(
594 "Only integers, slices and integer or "
595 "boolean arrays are valid indices."
596 )
File ~/conda/envs/dev/lib/python3.11/site-packages/pyarrow/table.pxi:959, in pyarrow.lib.ChunkedArray.filter()
File ~/conda/envs/dev/lib/python3.11/site-packages/pyarrow/compute.py:264, in _make_generic_wrapper.<locals>.wrapper(memory_pool, options, *args, **kwargs)
262 if args and isinstance(args[0], Expression):
263 return Expression._call(func_name, list(args), options)
--> 264 return func.call(args, options, memory_pool)
File ~/conda/envs/dev/lib/python3.11/site-packages/pyarrow/_compute.pyx:385, in pyarrow._compute.Function.call()
File ~/conda/envs/dev/lib/python3.11/site-packages/pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()
File ~/conda/envs/dev/lib/python3.11/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()
ArrowNotImplementedError: Function 'array_filter' has no kernel matching input types (string_view, bool)
Essentially because it tries to convert to numpy, and that part is failing (because of a kernel not being implemented for string_view).
Some quick thoughts:
to_numpy()
would work regardless of filter being implemented or not (although hopefully a next pyarrow release will support this)pandas.Series <exception occurred while creating the repr>
would be more useful?
- Probably printing something like
pandas.Series <exception occurred while creating the repr>
would be more useful?
Makes sense for the series, but would this affect the repr when contained within a dataframe?
@WillAyd - should the title be Arrow String View
? Want to make sure I'm understanding the issue.
I don't think so - binary view is the terminology used by the arrow specification, which generally covers what you may be thinking of as bytes and strings:
https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-view-layout
The same issue occurs with the binary_view
pyarrow type as well
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
While this does not:
This might actually be an upstream bug with pyarrow (@jorisvandenbossche typically knows best)