Open MaximeWeyl opened 5 years ago
One workaround I found is to pass Index to from_product instead of lists :
import pandas as pd
a = pd.Timestamp("2019-2-2").date()
i = pd.MultiIndex.from_product([
pd.Index([a, a]),
[2, 3]
])
print("a={} ({}".format(a, type(a)))
print(i[0])
Output :
a=2019-02-02 (<class 'datetime.date'>
(datetime.date(2019, 2, 2), 2)
I expect the output to respect the type I gave to from_product :
You don't specify a dtype, right? So this is a bug in inference.
I would expect the MultiIndex constructors to follow the behavior of Index (and Series), which preserve the datetime.
In [24]: pd.Index([a, a])
Out[24]: Index([2019-02-02, 2019-02-02], dtype='object')
In [25]: pd.Index([a, a])[0]
Out[25]: datetime.date(2019, 2, 2)
Hmm the bug seems to be in Categorical
(used by MI internally)
In [31]: pd.Categorical([a]).categories
Out[31]: DatetimeIndex(['2019-02-02'], dtype='datetime64[ns]', freq=None)
Just noting the distinction that this is an issue with datetime.date
objects, which are not first class in pandas.
I got tripped up on this as well when using groupby
on a datetime.date
column and moving it in and out of the index. The different behavior between Index and MultiIndex is especially tricky.
One suggestion: maybe add documentation/warning on the preferred way to round timestamps to date? This is a common operation and dt.date
was all I came across in the docs.
https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.date.html
(See also https://github.com/pandas-dev/pandas/issues/15906)
In [4]: df = pd.DataFrame({'date' : pd.date_range(start='2020-01-01', periods=10), 'label' : 1})
In [5]: df['date'] = df['date'].dt.date
In [6]: display(df.date[0])
datetime.date(2020, 1, 1)
In [7]: df_1 = df.set_index('date').reset_index()
In [8]: display(df_1.date[0])
datetime.date(2020, 1, 1)
In [9]: df_1 = df.set_index(['date', 'label']).reset_index()
In [10]: display(df_1.date[0])
Timestamp('2020-01-01 00:00:00')
Code Sample, a copy-pastable example if possible
Output is :
Problem description
When using from_product with python datetimes, the resulting MultiIndex level is converted to pandas datetimes (Timestamps). There is no way to keep the original python datetime which I want. I got the same behavior with from_tuples. But it was not the case with from_arrays.
Expected Output
I expect the output to respect the type I gave to from_product :
Output of
pd.show_versions()