pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.92k stars 18.03k forks source link

QST: pandas.DataFrame() converts pyarrow.array() to numpy series #54057

Open yoonghm opened 1 year ago

yoonghm commented 1 year ago

Research

Link to question on StackOverflow

https://stackoverflow.com/questions/76648782/pandas-dataframe-converts-pyarrow-array-to-numpy-series

Question about pandas

No response

btparrish commented 1 year ago

from stackoverflow: As of pandas 2.0.x, the pandas constructors do not recognize pyarrow objects. In order to get a pyarrow dtype, you'll need to pass dtype=string[pyarrow]". I expect this will change in an upcoming pandas version.

lithomas1 commented 1 year ago

I think this should work, and is a bug.

We should be preserving pyarrow dtypes if they are passed in.

cc @phofl

mroeschke commented 1 year ago

Just noting the current supported way for this to work is to pass your pyarrow objects to pd.arrays.ArrowExtensionArray https://pandas.pydata.org/docs/user_guide/pyarrow.html#data-structure-integration

jbrockmendel commented 1 year ago

The solution here is to add in sanitize_array a check for lib.is_pyarrow_array. The difficult part is ensuring that we find all the other places that may need the same check (off the top of my head pd.array)