Open a10y opened 2 weeks ago
The failure is occurring in the type
property accessor on ArrowDtype
. I'm not familiar enough with Pandas internals to be sure, but my suspicion is that this if statement should include a check for is_string_view
I don't have permissions to add but this should include the Arrow label
The proposed fix resolves the OP and doesn't break any tests for me locally. I'm not finding anything on string_view
in the code or docs, do we support string views @WillAyd / @jorisvandenbossche?
I am not sure if the ArrowDtype has ever fully been scoped out, but as far as I am aware we should allow any Arrow data type to be stored within that container
ArrowDtype
is AFAIK indeed quite agnostic and supporting any pyarrow data type to put into it. But then further operations on it rely on pyarrow.compute
functions, and not many of those are actually implemented for the newer string_view
data type on the pyarrow side.
I don't know if we should warn users about that when the construct a dataframe with string_view.. Or maybe we should actually also consider still by default convert string_view
to string
, given those usability issues (the question then is mostly how to let the user actually ask for allowing string_view explicitly, if by default we would still convert)
Or maybe we should actually also consider still by default convert
string_view
tostring
I'd be hesitant to do this without clarifying how we expect logical types to behave. I think that would also be the exact opposite of what polars does (i.e. they convert string to string_view) so that would lead to some fragmentation in expectations
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Our library is producing Arrow binary view arrays (strings and binary) and we want to allow users to convert into Pandas DataFrame.
We are using the
pd.ArrowDtype
constructor to allow creating Pandas arrays that are backed with Arrow storage. The example I've attached fails also when you change"c"
toNone
(i.e. problem hits for both nullable and non-nullable types).Full repro with error message:
This seems like a distinct issue from #59883.
Expected Behavior
I'd expect the throwing example to not throw.
Installed Versions