pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.92k stars 18.03k forks source link

Add `shift` to `dt` accessor #31705

Open giuliobeseghi opened 4 years ago

giuliobeseghi commented 4 years ago

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        "date": pd.date_range("2020", freq="D", periods=10),
        "value": np.random.rand(10),
    }
)

shifted_dates = df.date.dt.shift(1)

Problem description

I always thought that the dt accessor should replicate more or less the functionality of a DatetimeIndex to a series or a column of a dataframe. If I can do it with a DatetimeIndex, I should be able to do it with a collection of datetimes too, right?

Expected Output

This is my workaround to obtain the expected output. It's not very nice.

index = df.set_index("date").index
index.freq = index.inferred_freq

shifted_dates = df.date.copy()
shifted_dates[:] = index.shift(1)

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.6.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None pandas : 1.0.0 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 45.1.0.post20200127 Cython : 0.29.14 pytest : 5.3.5 hypothesis : 5.4.1 sphinx : 2.3.1 blosc : None feather : None xlsxwriter : 1.2.7 lxml.etree : 4.5.0 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : 7.12.0 pandas_datareader: None bs4 : 4.8.2 bottleneck : 1.3.1 fastparquet : None gcsfs : None lxml.etree : 4.5.0 matplotlib : 3.1.3 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : None pytables : None pytest : 5.3.5 pyxlsb : None s3fs : 0.4.0 scipy : 1.3.2 sqlalchemy : 1.3.13 tables : 3.6.1 tabulate : None xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.2.7 numba : 0.48.0
giuliobeseghi commented 4 years ago

This could also (and more simply) be achieved with

df.date + df.date.dt.freq

if dt.freq returned an offset object instead of a str

devjeetr commented 4 years ago

Could I work on this if no one is already? I can investigate the two proposed solutions and report back.