pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.72k stars 17.93k forks source link

BUG: DateOffset does not work with date32[pyarrow] datetypess #57168

Open jonathan-gantner opened 9 months ago

jonathan-gantner commented 9 months ago

Pandas version checks

Reproducible Example

import pandas as pd
import datetime as dt

s = pd.Series([dt.date(2022, 12, 30)], dtype="date32[pyarrow]")
_ = s + pd.offsets.MonthEnd()

Issue Description

I receive the following error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/myUser/Experiments/pandas/pandas_venv/lib64/python3.9/site-packages/pandas/core/ops/common.py", line 76, in new_method
    return method(self, other)
  File "/home/myUser/Experiments/pandas/pandas_venv/lib64/python3.9/site-packages/pandas/core/arraylike.py", line 186, in __add__
    return self._arith_method(other, operator.add)
  File "/home/myUser/Experiments/pandas/pandas_venv/lib64/python3.9/site-packages/pandas/core/series.py", line 6130, in _arith_method
    return base.IndexOpsMixin._arith_method(self, other, op)
  File "/home/myUser/Experiments/pandas/pandas_venv/lib64/python3.9/site-packages/pandas/core/base.py", line 1380, in _arith_method
    result = ops.arithmetic_op(lvalues, rvalues, op)
  File "/home/myUser/Experiments/pandas/pandas_venv/lib64/python3.9/site-packages/pandas/core/ops/array_ops.py", line 273, in arithmetic_op
    res_values = op(left, right)
  File "/home/myUser/Experiments/pandas/pandas_venv/lib64/python3.9/site-packages/pandas/core/ops/common.py", line 76, in new_method
    return method(self, other)
  File "/home/myUser/Experiments/pandas/pandas_venv/lib64/python3.9/site-packages/pandas/core/arraylike.py", line 186, in __add__
    return self._arith_method(other, operator.add)
  File "/home/myUser/Experiments/pandas/pandas_venv/lib64/python3.9/site-packages/pandas/core/arrays/arrow/array.py", line 785, in _arith_method
    return self._evaluate_op_method(other, op, ARROW_ARITHMETIC_FUNCS)
  File "/home/myUser/Experiments/pandas/pandas_venv/lib64/python3.9/site-packages/pandas/core/arrays/arrow/array.py", line 727, in _evaluate_op_method
    other = self._box_pa(other)
  File "/home/myUser/Experiments/pandas/pandas_venv/lib64/python3.9/site-packages/pandas/core/arrays/arrow/array.py", line 408, in _box_pa
    return cls._box_pa_scalar(value, pa_type)
  File "/home/myUser/Experiments/pandas/pandas_venv/lib64/python3.9/site-packages/pandas/core/arrays/arrow/array.py", line 444, in _box_pa_scalar
    pa_scalar = pa.scalar(value, type=pa_type, from_pandas=True)
  File "pyarrow/scalar.pxi", line 1150, in pyarrow.lib.scalar
  File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: No temporal attributes found on object.

Expected Behavior

Since it is a date-datatype, the DateOffset should be applied as for TimeStamps, i.e. the result should be pd.Series( [dt.date(2022,12,31)], dtype="date32[pyarrow]").

Installed Versions

INSTALLED VERSIONS ------------------ commit : db11e25d2b1175fdf85d963a88ff5a1d4bdb6fd8 python : 3.9.18.final.0 python-bits : 64 OS : Linux OS-release : 4.18.0-513.9.1.el8_9.x86_64 Version : #1 SMP Thu Nov 16 10:29:04 EST 2023 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 3.0.0.dev0+197.gdb11e25d2b numpy : 1.26.3 pytz : 2023.4 dateutil : 2.8.2 setuptools : 50.3.2 pip : 20.2.4 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 15.0.0 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.4 qtpy : None pyqt5 : None
rhshadrach commented 9 months ago

@jbrockmendel - is there planned support for arrow dtypes here?

jbrockmendel commented 9 months ago

I have no plans to implement it, but no objection if someone else wants to

openerror commented 8 months ago

Thanks for opening this Issue. The exact same problem has been bothering me for months and I never got around to report it. Yes --- it would be great if arrow types are not second-class citizens when it comes to a useful feature like offsets.