Open paul-lilley opened 5 years ago
pd.Timestamp is only implemented within the bounds Timestamp('1677-09-21 00:12:43.145225') to Timestamp('2262-04-11 23:47:16.854775807'). Outside of that range we could have start_time and end_time return datetime objects I guess
This looks related to https://github.com/pandas-dev/pandas/pull/27916/ but outside my comfort zone to try altering to return python datetime objects for start_time and end_time when the Period is outside the range of pd.Timestamp
I don't think the return type should depend on the value.
Is there any advantage to returning a Timestamp, rather than always return a datetime? If not, we could deprecate Period.start_time and Period.end_time in favor of a property that always returns a datetime.
On Tue, Aug 27, 2019 at 3:04 PM Paul Lilley notifications@github.com wrote:
This looks related to https://github.com/pandas-dev/pandas/pull/27916/ http://url but outside my comfort zone to try altering to return python datetime objects for start_time and end_time when the Period is outside the range of pd.Timestamp
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/28104?email_source=notifications&email_token=AAKAOIQ6OBZ2GYMDDRAFEB3QGWCEFA5CNFSM4IOZMFA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5I6JWY#issuecomment-525460699, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOIRN3CNYW5YMXGZR53TQGWCEFANCNFSM4IOZMFAQ .
I can't really comment on the advantages of returning a Timestamp, but as Period is the recommended means to handle datetimes outside pd.Timestamp (https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#representing-out-of-bounds-spans) it makes more sense (at least to me) that Period methods don't raise exceptions with datetimes outside pd.Timestamp. My limited use case is trying to resolve https://github.com/pandas-dev/pandas/issues/20927, which currently I've done using datetime objects
but as Period is the recommended means to handle datetimes outside pd.Timestamp
That's mostly in the context of an array of values inside a Series / DataFrame. When we're returning scalar, there's no need to return a Period (which has different semantics to a datetime).
Good point. I agree that it makes sense to deprecate Period.start_time and Period.end_time in favor of a property that always returns a datetime
Need a name. Perhaps Period.start_datetime
?
On Tue, Aug 27, 2019 at 3:32 PM Paul Lilley notifications@github.com wrote:
Good point. I agree that it makes sense to deprecate Period.start_time and Period.end_time in favor of a property that always returns a datetime
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/28104?email_source=notifications&email_token=AAKAOIR3SJQXOX2XL7JW4PDQGWFNHA5CNFSM4IOZMFA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5JAY6I#issuecomment-525470841, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOITUL2A7FTGXDAWLQKLQGWFNHANCNFSM4IOZMFAQ .
Sounds good, Period.start_datetime
and Period.end_datetime
this will still run up against the datetime implementation bounds. do we care about those?
Fair point, though at least the datetime.MINYEAR to datetime.MAXYEAR range covers the typical 'max-date' values (9999-12-31 in DB2 and Teradata, 31DEC9999 in SAS) that I've encountered in datawarehousing tables used to implement type 2 slowly changing dimensions with start_date/end_date. Often the currently valid row is marked by end_date = some (arbitrarily high) 'max-date' to avoid nulls in the column.
I personally don't care about representing date times that python can't.
On Tue, Aug 27, 2019 at 11:21 PM Paul Lilley notifications@github.com wrote:
Fair point, though at least the datetime.MINYEAR to datetime.MAXYEAR range covers the typical 'max-date' values (9999-12-31 in DB2 and Teradata, 31DEC9999 in SAS) that I've encountered in datawarehousing tables used to implement type 2 slowly changing dimensions with start_date/end_date. Often the currently valid row is marked by end_date = some (arbitrarily high) 'max-date' to avoid nulls in the column.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/28104?email_source=notifications&email_token=AAKAOITELKKKPYYPRVK4NX3QGX4M3A5CNFSM4IOZMFA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5J25XA#issuecomment-525577948, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOIQDVHANQROUJPAUR6LQGX4M3ANCNFSM4IOZMFAQ .
Hi guys, just to add: when reading SQL results from cx_oracle to pandas, rather than throwing an out of range error, the returned pandas dataframe converts datetimes from 9999-12-31 to a particular timestamp in the year 1822, and 2999-12-31 to a particular timestamp in the year 1830. Basically restarting from 1677 for dates beyond 2262. Shouldn't pandas ideally throw an out of bounds error or return NaT for these dates?
now that we have non-nano Timestamp support we could handle this
Code Sample, a copy-pastable example if possible
Output: (with pandas 0.25.0) 2262-04-11 00:00:00 1677-09-21 00:25:26.290448384
with pandas 0.25.0+216.g1aca08aa6 I get an OutOfBoundsDatetime error
Problem description
Overflow means that the Period is incorrect, and also hinders use of Period for date/times outside the limits of pd.Timestamp
Expected Output
Output: 2262-04-11 00:00:00 2262-04-12 00:00:00
Output of
pd.show_versions()