pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.33k stars 17.81k forks source link

DOC: Clarify existence Series.dt.to_timestamp #59671

Open sfc-gh-joshi opened 3 weeks ago

sfc-gh-joshi commented 3 weeks ago

Pandas version checks

Location of the documentation

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.to_timestamp.html (does not exist)

Documentation problem

Copied from a Modin issue: https://github.com/modin-project/modin/issues/7232#issuecomment-2088100973

Series.dt.to_timestamp() is a valid method for PeriodProperties, but not for DatetimeProperties, TimedeltaProperties. For example, following is valid:

>>> seconds_series = pd.Series(pd.period_range(start="2000-01-01 00:00:00", end="2000-01-01 00:00:03", freq="s"))
>>> seconds_series
0    2000-01-01 00:00:00
1    2000-01-01 00:00:01
2    2000-01-01 00:00:02
3    2000-01-01 00:00:03
dtype: period[s]
>>> seconds_series.dt.to_timestamp()
0   2000-01-01 00:00:00
1   2000-01-01 00:00:01
2   2000-01-01 00:00:02
3   2000-01-01 00:00:03
dtype: datetime64[ns]

However, it is invalid for timedelta:

>>> seconds_series = pd.Series(pd.timedelta_range(start="1 second", periods=3, freq="s"))
>>> seconds_series.dt.to_timestamp()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'TimedeltaProperties' object has no attribute 'to_timestamp'
>>> seconds_series
0   0 days 00:00:01
1   0 days 00:00:02
2   0 days 00:00:03
dtype: timedelta64[ns]

Suggested fix for documentation

Clarify whether to_timestamp should exist for all Series.dt accessors, and whether it should be implemented for classes other than PeriodProperties.

saldanhad commented 2 days ago

Hi @sfc-gh-joshi, you can use to_timestamp() method with dt accessors however, the pre-requisite is to have a specific period index, which means dates have to be specific eg: 2000-01-01 00:00:01 and not relative like 0 days, 1 days etc as can be seen in your example which is why you get the error.

If however you needed them in datetime format you could add a base_date as reference as shown below which could then convert it without the need to use to_timestamp().

seconds_series = pd.Series(pd.timedelta_range(start="1 second", periods=3, freq="s"))
base_date = pd.Timestamp('2023-01-01')
result = seconds_series + base_date
result 

2023-01-01 00:00:01
2023-01-01 00:00:02
2023-01-01 00:00:03

dtype: datetime64[ns]
saldanhad commented 2 days ago

Hi @sfc-gh-joshi, you can use to_timestamp() method with dt accessors however, the pre-requisite is to have a specific period index, which means dates have to be specific eg: 2000-01-01 00:00:01 and not relative like 0 days, 1 days etc as can be seen in your example which is why you get the error, hope this helps.

Hi @rhshadrach , Congrats on the recent release of pandas 2.2.3, with respect to this issue I believe adding the above, along with a brief explanation on how using different freq values affect the output could help here. Lmkyt, and I would be happy to release a PR to address this if needed?

rhshadrach commented 1 day ago

I think Series.dt.to_timestamp should be added to the API docs. But I think it makes sense to not implement to_timestamp for datetime and timedelta. Labeling this as a docs issue, @saldanhad - a PR is welcome!

rhshadrach commented 1 day ago

@saldanhad - just take, no slash.

saldanhad commented 1 day ago

take