pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.42k stars 17.85k forks source link

BUG: AssertionError when multiplying timedelta Series with a pandas nullable dtype Series #58054

Open jamesdow21 opened 6 months ago

jamesdow21 commented 6 months ago

Pandas version checks

Reproducible Example

import pandas as pd
import numpy as np
from datetime import timedelta
td_series = pd.Series(np.random.rand(5) * timedelta(hours=1))
other = pd.Series(np.random.rand(5) < 0.5)
td_series * other.astype("boolean")

Issue Description

When multiplying a Series with a timedelta64 dtype with another Series that uses any of the pandas nullable dtypes ('Int8', 'Int16', 'Int32', 'Int64', 'UInt8', 'UInt16', 'UInt32', 'UInt64', 'Float32', 'Float64', or 'boolean'), an assertion error is raised inside TimedeltaArray._simple_new where it is checking that the new array is numpy.ndarray, but in this case it is instead an instance of TimedeltaArray

This error does not occur with the numpy backed dtypes ('int8', 'int16', 'int32', 'int64', 'uint8', 'uint16', 'uint32', 'uint64', 'float32', 'float64', or 'bool')

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[6], line 1
----> 1 td_series * other.astype("boolean")

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\ops\common.py:76, in _unpack_zerodim_and_de
fer.<locals>.new_method(self, other)
     72             return NotImplemented
     74 other = item_from_zerodim(other)
---> 76 return method(self, other)

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\arraylike.py:202, in OpsMixin.__mul__(self,
 other)
    200 @unpack_zerodim_and_defer("__mul__")
    201 def __mul__(self, other):
--> 202     return self._arith_method(other, operator.mul)

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\series.py:6126, in Series._arith_method(sel
f, other, op)
   6124 def _arith_method(self, other, op):
   6125     self, other = self._align_for_op(other)
-> 6126     return base.IndexOpsMixin._arith_method(self, other, op)

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\base.py:1382, in IndexOpsMixin._arith_metho
d(self, other, op)
   1379     rvalues = np.arange(rvalues.start, rvalues.stop, rvalues.step)
   1381 with np.errstate(all="ignore"):
-> 1382     result = ops.arithmetic_op(lvalues, rvalues, op)
   1384 return self._construct_result(result, name=res_name)

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\ops\array_ops.py:273, in arithmetic_op(left
, right, op)
    260 # NB: We assume that extract_array and ensure_wrapped_if_datetimelike
    261 #  have already been called on `left` and `right`,
    262 #  and `maybe_prepare_scalar_for_op` has already been called on `right`
    263 # We need to special-case datetime64/timedelta64 dtypes (e.g. because numpy
    264 # casts integer dtypes to timedelta64 when operating with timedelta64 - GH#22390)
    266 if (
    267     should_extension_dispatch(left, right)
    268     or isinstance(right, (Timedelta, BaseOffset, Timestamp))
   (...)
    271     # Timedelta/Timestamp and other custom scalars are included in the check
    272     # because numexpr will fail on it, see GH#31457
--> 273     res_values = op(left, right)
    274 else:
    275     # TODO we should handle EAs consistently and move this check before the if/else
    276     # (https://github.com/pandas-dev/pandas/issues/41165)
    277     # error: Argument 2 to "_bool_arith_check" has incompatible type
    278     # "Union[ExtensionArray, ndarray[Any, Any]]"; expected "ndarray[Any, Any]"
    279     _bool_arith_check(op, left, right)  # type: ignore[arg-type]

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\ops\common.py:76, in _unpack_zerodim_and_de
fer.<locals>.new_method(self, other)
     72             return NotImplemented
     74 other = item_from_zerodim(other)
---> 76 return method(self, other)

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\arrays\timedeltas.py:498, in TimedeltaArray
.__mul__(self, other)
    496 # numpy will accept float or int dtype, raise TypeError for others
    497 result = self._ndarray * other
--> 498 return type(self)._simple_new(result, dtype=result.dtype)

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\arrays\timedeltas.py:221, in TimedeltaArray
._simple_new(cls, values, freq, dtype)
    219 assert lib.is_np_dtype(dtype, "m")
    220 assert not tslibs.is_unitless(dtype)
--> 221 assert isinstance(values, np.ndarray), type(values)
    222 assert dtype == values.dtype
    223 assert freq is None or isinstance(freq, Tick)

AssertionError: <class 'pandas.core.arrays.timedeltas.TimedeltaArray'>

Expected Behavior

Return the same results as multiplying by the numpy backed dtypes (or at least raise a different error than AssertionError)

Installed Versions

INSTALLED VERSIONS ------------------ commit : bdc79c146c2e32f2cab629be240f01658cfb6cc2 python : 3.12.2.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19045 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : English_United States.1252 pandas : 2.2.1 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 69.2.0 pip : 24.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 5.1.0 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.3 IPython : 8.22.2 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : 1.3.8 dataframe-api-compat : None fastparquet : None fsspec : 2024.3.1 gcsfs : None matplotlib : 3.8.3 numba : 0.59.1 numexpr : 2.9.0 odfpy : None openpyxl : None pandas_gbq : None pyarrow : 15.0.2 pyreadstat : None python-calamine : None pyxlsb : None s3fs : 2024.3.1 scipy : 1.12.0 sqlalchemy : None tables : None tabulate : None xarray : 2024.2.0 xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None
kvnwng11 commented 6 months ago

take