Open dfonnegra opened 4 years ago
I iterated through all timezones and got that this list of timezones fail too:
America/Asuncion America/Campo_Grande America/Cuiaba America/Havana America/Punta_Arenas America/Santiago America/Sao_Paulo America/Scoresbysund Antarctica/Casey Antarctica/Palmer Asia/Amman Asia/Beirut Asia/Damascus Asia/Gaza Asia/Hebron Asia/Tehran Atlantic/Azores Brazil/East Chile/Continental Cuba Iran
What's the expected behavior here? If the user is trying to create an invalid datetime, shouldn't we raise?
The same behaviour used when this exact same bug happened with the resample method in the 0.23.x version. Check this in the 0.24.0 changelong
What was the resolution there? Can you post the actual expected output in the original post?
In the resample case, an assumption was made on the behavior of ambiguous and nonexistent timezones since the resulting index is a transformation from the original index (i.e the binning).
In the date_range
case, I agree with Tom's assessment https://github.com/pandas-dev/pandas/issues/30378#issuecomment-568010280 in which the idiomatic alternative is:
pd.date_range(start=pd.Timestamp('2016-10-05 00:00:00'), end=pd.Timestamp('2016-10-30 00:00:00'), freq='1D').tz_localize('America/Sao_Paulo', nonexistent=...)
I think ultimately the tz
parameter should be deprecated from date_range
In the resample case, an assumption was made on the behavior of ambiguous and nonexistent timezones since the resulting index is a transformation from the original index (i.e the binning).
In the
date_range
case, I agree with Tom's assessment #30378 (comment) in which the idiomatic alternative is:
pd.date_range(start=pd.Timestamp('2016-10-05 00:00:00'), end=pd.Timestamp('2016-10-30 00:00:00'), freq='1D').tz_localize('America/Sao_Paulo', nonexistent=...)
I think ultimately the
tz
parameter should be deprecated fromdate_range
I don't think that deprecating the tz
parameter is the right solution, as one gets exactly the same error without using the tz
parameter, but when start and end themselves are timezone-aware:
start = pd.Timestamp("2016-10-05").tz_localize("America/Sao_Paulo")
end = pd.Timestamp("2016-10-30").tz_localize("America/Sao_Paulo")
pd.date_range(start=start, end=end, freq="D")
What's the expected behavior here? If the user is trying to create an invalid datetime, shouldn't we raise?
I think date_range
should make the same assumptions as were made in resample
, such that the above returns the same as:
pd.Series(1, index=pd.date_range(start, end, freq="h")).resample("D").sum().index
which returns:
DatetimeIndex(['2016-10-05 00:00:00-03:00', '2016-10-06 00:00:00-03:00',
'2016-10-07 00:00:00-03:00', '2016-10-08 00:00:00-03:00',
'2016-10-09 00:00:00-03:00', '2016-10-10 00:00:00-03:00',
'2016-10-11 00:00:00-03:00', '2016-10-12 00:00:00-03:00',
'2016-10-13 00:00:00-03:00', '2016-10-14 00:00:00-03:00',
'2016-10-15 00:00:00-03:00', '2016-10-16 01:00:00-02:00',
'2016-10-17 00:00:00-02:00', '2016-10-18 00:00:00-02:00',
'2016-10-19 00:00:00-02:00', '2016-10-20 00:00:00-02:00',
'2016-10-21 00:00:00-02:00', '2016-10-22 00:00:00-02:00',
'2016-10-23 00:00:00-02:00', '2016-10-24 00:00:00-02:00',
'2016-10-25 00:00:00-02:00', '2016-10-26 00:00:00-02:00',
'2016-10-27 00:00:00-02:00', '2016-10-28 00:00:00-02:00',
'2016-10-29 00:00:00-02:00', '2016-10-30 00:00:00-02:00'],
dtype='datetime64[ns, America/Sao_Paulo]', freq='D')
Notice that the index for 2016-10-16 is shifted by 1 hour.
Code Sample
Problem description
[When I try to generate a range of daily data with America/Sao_Paulo timezone, it breaks with NonExistentTimeError: 2016-10-16 00:00:00. My pandas version is 0.25.3]