pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.79k stars 17.97k forks source link

startingMonth ignored on non-unitary Quarter periods #29576

Open aulemahal opened 5 years ago

aulemahal commented 5 years ago

Code Sample

import pandas as pd
import numpy as np

d = pd.Series(data=np.zeros(365), 
                       index=pd.date_range('1950-01-01', '1950-12-31', freq='D'))
d.resample('2QS-MAR').mean()

returns:

1949-12-01    0.0
1950-06-01    0.0
1950-12-01    0.0
Freq: 2QS-MAR, dtype: float64

But as I explicitly asked for 2 Quarters starting in March, I expected:

1949-09-01    0.0
1950-03-01    0.0
1950-09-01    0.0
Freq: 2QS-MAR, dtype: float64

So, when specifying a multiple, the first time still goes to the closest single quarter before.

Not really a duplicate, but closely related to #22362.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.3.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-66-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : fr_CA.UTF-8 LOCALE : fr_CA.UTF-8 pandas : 0.25.3 numpy : 1.17.3 pytz : 2019.3 dateutil : 2.8.1 pip : 9.0.1 setuptools : 39.0.1 Cython : None pytest : 5.2.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.4.1 html5lib : 0.999999999 pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : 7.9.0 pandas_datareader: None bs4 : 4.8.1 bottleneck : 1.2.1 fastparquet : None gcsfs : None lxml.etree : 4.4.1 matplotlib : 3.1.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None s3fs : None scipy : 1.3.1 sqlalchemy : None tables : None xarray : 0.13.0 xlrd : None xlwt : None xlsxwriter : None
mroeschke commented 1 year ago

This looks to work on main now. Could use a test

idor980 commented 1 year ago

take

idor980 commented 1 year ago

This looks to work on main now. Could use a test

I am currently using version 2.0.1 of Pandas and this specific problem isn’t fixed.

joshdelg commented 1 year ago

take

quangngd commented 1 year ago

Did some digging and i think it is because of the implementation of offset. rollback and rollforward do not take n into account.

offset = QuarterBegin(startingMonth=1, n=2) #2QS
dt = pd.Timestamp("1950-01-02")
offset.rollback(dt), offset.rollforward(dt)

and

offset = QuarterBegin(startingMonth=1, n=1) #QS
dt = pd.Timestamp("1950-01-02")
offset.rollback(dt), offset.rollforward(dt)

both return (Timestamp('1950-01-01 00:00:00'), Timestamp('1950-04-01 00:00:00')).

@mroeschke please confirm if this is expected or not.

KatsiarynaDzibrova commented 4 months ago

take