Closed qltwis closed 2 months ago
Thanks for the report. This will be fixed in pyarrow 12 so closing as it will be fixed upstream: https://github.com/apache/arrow/pull/34445
Thanks for the report. This will be fixed in pyarrow 12 so closing as it will be fixed upstream: apache/arrow#34445
Thank you for the info. With pyarrow 12 pd.read_parquet("test.parq", dtype_backend="pyarrow").index.dtype
produces the correct index dtype.
However, with the numpy backend the index type is still read as object. The other columns are correctly identified as string[python]
. I imagine that's not intended behavior, right?
Unfortunately this bug persists despite the fix in pyarrow 12.
This is unfortunately the intended behavior as long as pandas default string type is object
. parquet does not encode how the string type implementation from pandas.
This can change in pandas 3.0 when the default string implementation is pyarrow
if installed as a dependency. Otherwise one will need to specify the dtype_backend
argument to recover the implementation so closing
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
The dtype information for the index column is lost when saving as parquet.
Expected Behavior
pd.read_parquet("test.parq").index.dtype
should returnstring[python]
but instead givesdtype('O')
Installed Versions