Xorbits integrates the pyarrow backend. See this blog post for more info. And we also introduce use_arrow_dtype in read_parquet. If we install the pyarrow backend, Xorbits will detect it, marking use_arrow_dtype to True in the configuration and it will read parquet with arrow dtype. Some dtypes of pyarrow and pandas are different, for example, timestamp. Suppose time is a timestamp column. If time is a pandas dtype we can do it like this: df["time"].dt. But pyarrow does not have dt attribute.
Xorbits integrates the pyarrow backend. See this blog post for more info. And we also introduce
use_arrow_dtype
inread_parquet
. If we install the pyarrow backend, Xorbits will detect it, markinguse_arrow_dtype
toTrue
in the configuration and it will read parquet with arrow dtype. Some dtypes of pyarrow and pandas are different, for example, timestamp. Supposetime
is a timestamp column. Iftime
is a pandas dtype we can do it like this:df["time"].dt
. But pyarrow does not havedt
attribute.If arrow is installed, xorbits use arrow and
use_arrow_dtype
of the configuration is set as true. So here we read data in pyarrow format: https://github.com/xorbitsai/xorbits/blob/b1f1107af931e9101b22e4f1e000add3820297b5/python/xorbits/_mars/dataframe/datasource/read_parquet.py#L181C1-L201C18We may include this in our document or change the default behavior of the
ArrowEngine
when reading parquet files.