pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.27k stars 17.8k forks source link

ENH: Timestamp/DTI. to epoch time #14772

Open jreback opened 7 years ago

jreback commented 7 years ago

add a .to_epoch(unit='s') method to Timestamp and DatetimeIndex that returns the epoch for that unit. I think would default this to s as that seems pretty common, but allow any of our units.

In [19]: s = Series(pd.date_range('20160101',periods=3))

In [20]: s
Out[20]: 
0   2016-01-01
1   2016-01-02
2   2016-01-03
dtype: datetime64[ns]

In [21]: ((s-Timestamp(0)) / Timedelta('1s')).astype('i8')
Out[21]: 
0    1451606400
1    1451692800
2    1451779200
dtype: int64
jreback commented 7 years ago

xref #11022, https://github.com/pandas-dev/pandas/issues/6741

jreback commented 7 years ago

This also works, but exposing internal impl, verbose and not user friendly

In [36]: Series(s.values.astype('datetime64[s]').astype('i8'), index=s.index)
Out[36]: 
0    1451606400
1    1451692800
2    1451779200
dtype: int64
jorisvandenbossche commented 7 years ago

If we would add user facing functionality, I think I would like to_epoch() most (certainly not something like int64[s] IMO)

jreback commented 7 years ago

I changed this to make this an enhancement for a simple .to_epoch() method on Timestamp/DTI.

jbrockmendel commented 6 years ago

since timestamp now has timestamp method, should,we use the same name for DTI?

jreback commented 6 years ago

yes this would be reasonable (though to be honest the .timestamp() name is not very informative.....

jorisvandenbossche commented 6 years ago

I also don't really like the name. It is rather confusing given that we already have a Timestamp class (for timestamps itself it is ok to keep subclass consistency). So when adding such a method to DatetimeIndex / dt accessor, I would think about not using the same name.

jreback commented 6 years ago

I am partial to to_epoch, we use this term elsewhere.

TomAugspurger commented 6 years ago

One question: what to do with NaTs?

In [5]: pd.DatetimeIndex(['2017', '2018', None]).values.astype('datetime64[s]').astype("i8")
Out[5]: array([          1483228800,           1514764800, -9223372036854775808])

Do we value having integer dtype more? I think so in this case.

TomAugspurger commented 6 years ago

Second question: timezones. Unix time is defined in UTC, so should we

This will necessitate an ambiguous parameter.

TomAugspurger commented 6 years ago

third question: how to handle higher-precision components?

In [6]: pd.DatetimeIndex(['2017-01-01T00:00:00.01', '2017-01-01T00:00:00.02']).to_epoch()
Out[6]: array([1483228800, 1483228800])

I don't think we should use floats and fractional components. So that leaves truncating or rounding to the nearest unit.

simonjayhawkins commented 5 years ago

One question: what to do with NaTs?

In [5]: pd.DatetimeIndex(['2017', '2018', None]).values.astype('datetime64[s]').astype("i8")
Out[5]: array([          1483228800,           1514764800, -9223372036854775808])

Do we value having integer dtype more? I think so in this case.

could a <IntegerArray> be returned in this case. it would need some casting since currently

pd.array(pd.DatetimeIndex(['2017', '2018', None]).values.astype('datetime64[s]'), dtype='Int64')

raises

TypeError: datetime64[s] cannot be converted to an IntegerDtype
jreback commented 5 years ago

yes there are various use cases where we could do things like this