Closed jgmarcel closed 2 years ago
Thanks @jgmarcel for the report.
first bad commit: [545a942424a26c4163e1f959ac6130984fc3fb41] BUG: Index([date]).astype("category").astype(object) roundtrip (#38552)
I'll mark as a regression for now pending further investigation.
Note that the set_index
followed by a reset_index
still creates a datetime64[ns]
column from the original object
column of date objects.
cc @jbrockmendel
Note that the set_index followed by a reset_index still creates a datetime64[ns] column from the original object column of date objects.
@jgmarcel support for date objects in generally is spotty. Your best bet is to use datetime or Timestamp objects
@jgmarcel support for date objects in generally is spotty. Your best bet is to use datetime or Timestamp objects
Agreed. Thank you for the good advice. Problem is data objects are all over our code, since anytime a DATE column is read via pandas.read_sql()
, it is translated to a series of date objects. Hence, casting everything to datetime or Timestamp objects would be a lot of effort… For now, we reverted to version 1.2.5, but I would like to avoid the work if possible.
Hi guys,
Do you believe it will be possible to get this fixed on version 1.3.4? I ask so I can better plan the effort of adapting our whole code base.
Thank you.
@jgmarcel would likely take a community pull request
core can provide review
might be quite tricky as date have very little support
changing milestone to 1.3.5
pls look at the 1.3.x release notes. IIRC this was changed on purpose in response to unforced conversion (e.g. to datetime times) it was desirable to keep datetime.date
.
Hi @jreback,
pls look at the 1.3.x release notes. IIRC this was changed on purpose in response to unforced conversion (e.g. to datetime times) it was desirable to keep
datetime.date
.
As initially stated, «I fail to find in the What’s new page the reason for that change of behavior». Would you mind pointing it out to me?
Anyway, the version policy clearly states that «API breaking changes should only occur in major releases», and «a deprecation path will be provided rather than an outright breaking change». Wouldn’t that be the case here, since we are talking about a breaking change that occurred between versions 1.2.5 and 1.3.0?
cc @simonjayhawkins
in 1.2.5..
pd.Index(
[
datetime.date(2021, 8, 1),
datetime.date(2021, 8, 2),
datetime.date(2021, 8, 3),
]
)
gives
Index([2021-08-01, 2021-08-02, 2021-08-03], dtype='object')
and using the DataFrame from the OP
df.set_index(["date"]).index
gives
Index([2021-08-01, 2021-08-02, 2021-08-03], dtype='object', name='date')
whereas for a MultiIndex
arr = [
datetime.date(2021, 8, 1),
datetime.date(2021, 8, 2),
datetime.date(2021, 8, 3),
]
pd.MultiIndex.from_arrays([arr, arr]).levels[0]
gives
DatetimeIndex(['2021-08-01', '2021-08-02', '2021-08-03'], dtype='datetime64[ns]', freq=None)
So the Index and MultiIndex constructors were inconsistent in the handling of object dtype arrays containing datetime objects in pandas 1.2.5.
As initially stated, «I fail to find in the What’s new page the reason for that change of behavior». Would you mind pointing it out to me?
The change of behavior in casting of datetime-like types in MultiIndex was done in #38552. Looking at the code changes in that PR, it is clear from the changed tests and comments added that this change was intentional. Unfortunately the release note added did not refer to changes in MultiIndex construction.
Anyway, the version policy clearly states that «API breaking changes should only occur in major releases», and «a deprecation path will be provided rather than an outright breaking change». Wouldn’t that be the case here, since we are talking about a breaking change that occurred between versions 1.2.5 and 1.3.0?
The policy also states
pandas will sometimes make behavior changing bug fixes, as part of minor or patch releases. Whether or not a change is a bug fix or an API-breaking change is a judgement call. We’ll do our best, and we invite you to participate in development discussion on the issue tracker or mailing list.
So the change in behavior could be considered a bug fix, since the MultiIndex constructor was inconsistent with the Index constructor and no further action.
However, the policy also states
Whenever possible, a deprecation path will be provided rather than an outright breaking change.
and
We will not introduce new deprecations in patch releases.
So, as an alternative, we could maybe restore the old behavior for 1.3.5 and add a deprecation of this behavior in 1.4
The only code change in #38552 was removing convert_dates=True
from values = maybe_infer_to_datetimelike(values, convert_dates=True)
I guess we could maybe pass a convert_dates parameter through to the Categorical constructor from the MultiIndex constructor. @jbrockmendel wdyt?
-1 on any change here we have very limited if any support for datetime.date
not adding more complexity
Hi @simonjayhawkins,
Your thorough explanation is very much appreciated. I see now how the change in behavior was more of a bug fix than an API-breaking change.
If I may pick your brain here, what do you believe would be the best way to achieve the pre-1.3 behavior, i.e. having datetime.date
objects cast to datetime64
in MultiIndex? Would it be to call pd.to_datetime(arg, errors='ignore')
where arg takes every column of the DataFrame (since I do not know in advance what its dtypes are)? Would it be saner to do that conversion immediately before or immediately after calling the MultiIndex constructor? Any other solution?
Thank you.
I guess we could maybe pass a convert_dates parameter through to the Categorical constructor from the MultiIndex constructor. @jbrockmendel wdyt?
It's possible. Though we'd then have a breaking change for anyone relying on the 1.3 behavior.
Would it be to call pd.to_datetime(arg, errors='ignore') where arg takes every column of the DataFrame
I'd check Index(col).inferred_type == "date"
I'd check
Index(col).inferred_type == "date"
That would be a great addition, yes. Thank you for that! However, I would also have to check for an inferred type of mixed
, for when my column of datetime.date
objects contains null dates, right?
removing this issue from the 1.3.5 milestone as I think the consensus is for no action.
@jbrockmendel if you can repsond to https://github.com/pandas-dev/pandas/issues/43091#issuecomment-961173962 we can probably close this issue. Thanks.
However, I would also have to check for an inferred type of mixed, for when my column of datetime.date objects contains null dates, right?
I'd go for lib.infer_dtype(col, skipna=True) == "date"
instead of checking for "mixed"
However, I would also have to check for an inferred type of mixed, for when my column of datetime.date objects contains null dates, right?
I'd go for
lib.infer_dtype(col, skipna=True) == "date"
instead of checking for "mixed"
Thank you very much! I was not aware of that function. Much appreciated.
[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandas.
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
[Edited to inform a much simpler example.]
Output
The output below has been generated with pandas 1.3.0 or higher.
Expected Output
The output below has been generated with pandas 1.2.5.
Problem description
Starting from pandas 1.3.0, the observed behavior changed: in a
MultiIndex
creation,datetime.date
objects are not cast todatetime64
anymore. I fail to find in the What’s new page the reason for that change of behavior. Is it by design or a bug?Output of
pd.show_versions()