pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.94k stars 18.04k forks source link

IMPL DTA should *only* hold i8 #24712

Closed jreback closed 5 years ago

jreback commented 5 years ago

seems we have DTA's holding i8 OR M8[ns], see: xref https://github.com/pandas-dev/pandas/pull/24686/files

This is really really odd, and we should not allow this, always converting to i8.

jreback commented 5 years ago

cc @TomAugspurger @jbrockmendel

TomAugspurger commented 5 years ago

To clarify, DatetimeArray._data is always an M8[ns] ndarray. The constructor accepts either integer or datetime64[ns] values.

In [6]: pd.arrays.DatetimeArray(np.array([1, 2]))._data
Out[6]:
array(['1970-01-01T00:00:00.000000001', '1970-01-01T00:00:00.000000002'],
      dtype='datetime64[ns]')

In [7]: pd.arrays.DatetimeArray(np.array([1, 2], dtype='M8[ns]'))._data
Out[7]:
array(['1970-01-01T00:00:00.000000001', '1970-01-01T00:00:00.000000002'],
      dtype='datetime64[ns]')
jreback commented 5 years ago

oh ok, its just sometimes _simple_new is getting i8 and sometimes M8[ns], worth making that consistent? (e.g. having the caller do that)

jbrockmendel commented 5 years ago

No, it’s allowed specifically so code can be shared by timedelta/datetime/period

jbrockmendel commented 5 years ago

@jreback did tom’s comment clarify this sufficiently?

jreback commented 5 years ago

i understand what is going on but it’s a pretty odd calling convention

meaning that the caller should guarantee that the input is of the appropriate type

jbrockmendel commented 5 years ago

meaning that the caller should guarantee that the input is of the appropriate type

OK then we're back to me response: the convention exists as it does so that DTA/TDA/PA can share code.

jbrockmendel commented 5 years ago

This is correct as is. Closing.