pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.23k stars 17.78k forks source link

BUG: subtracting datetime series from datetime dataframe, or datetime dataframe from datetime series, raises TypeError or UFuncTypeError #59529

Open sfc-gh-mvashishtha opened 3 weeks ago

sfc-gh-mvashishtha commented 3 weeks ago

Pandas version checks

Reproducible Example

import pandas as pd

(
pd.DataFrame(
[[1, 2], [3, 4]]
).astype('datetime64[ns]'
) 
- pd.Series([5, 6, 7]).astype('datetime64[ns]')
)

Issue Description

I'm getting TypeError: cannot subtract DatetimeArray from ndarray.

Doing the subtraction in the opposite direction (series - dataframe) gives UFuncTypeError: ufunc 'subtract' cannot use operands with types dtype('<M8[ns]') and dtype('float64').

I found a related issue https://github.com/pandas-dev/pandas/issues/31623.

Expected Behavior

This subtraction should work the way it would for integers:

(
pd.DataFrame(
  [[1, 2], [3, 4]]
) 
- pd.Series([5, 6, 7])
)

so for datetimes we should align the series on axis 1 and broadcast it to each row to get an output of

pd.DataFrame([
[pd.Timedelta(-4), pd.Timedelta(-4), pd.NaT],
[pd.Timedelta(-2), pd.Timedelta(-2), pd.NaT],
]
)

Installed Versions

``` INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.9.18.final.0 python-bits : 64 OS : Darwin OS-release : 23.5.0 Version : Darwin Kernel Version 23.5.0: Wed May 1 20:14:38 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6020 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.2 numpy : 1.26.3 pytz : 2023.3.post1 dateutil : 2.8.2 setuptools : 68.2.2 pip : 23.3.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.18.1 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.4 qtpy : None pyqt5 : None ```
rhshadrach commented 3 weeks ago

Confirmed on main - further investigations and PRs to fix are welcome!

sfc-gh-mvashishtha commented 2 weeks ago

I also found that timestamp + timedelta and timestamp - timedelta raise a similar error, ufunc 'add' cannot use operands with types dtype('<m8[ns]') and dtype('float64')

import pandas as pd

pd.DataFrame([pd.Timestamp(1)]) + pd.Series([pd.Timedelta(2), pd.Timedelta(3)])
pd.DataFrame([pd.Timestamp(1)]) - pd.Series([pd.Timedelta(2), pd.Timedelta(3)])
sukriti1 commented 1 week ago

take