pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.37k stars 17.83k forks source link

DEPR: QuarterBegin and BQuarterBegin return days that are not quarter beginnings #8435

Open nimishgautam opened 9 years ago

nimishgautam commented 9 years ago

In[43]: datetime(2014,10,10) + BQuarterBegin() Out[43]: Timestamp('2014-12-01 00:00:00')

In[45]: datetime(2014,10,10) + QuarterBegin() Out[45]: Timestamp('2014-12-01 00:00:00')

Expected output is 2015-01-01. (Note QuarterEnd and BQuarterEnd do produce the expected output of 2014-12-31)

jreback commented 9 years ago

The default startingMonth is 3 (which may be wrong, their is a comment in the code to that effect).

You can get what you expect by using a startingMonth=1

In [66]: datetime.datetime(2014,10,10) + pd.offsets.BQuarterBegin(startingMonth=3)
Out[66]: Timestamp('2014-12-01 00:00:00')

In [67]: datetime.datetime(2014,10,10) + pd.offsets.BQuarterBegin(startingMonth=1)
Out[67]: Timestamp('2015-01-01 00:00:00')

If you'd like to investigate to see why this is would be appreciated.

I don't know when/how this was done this way.

jreback commented 9 years ago

If any of you have a comment about this:

cc @bjonen cc @cancan101 cc @rockg cc @MichaelWS

MichaelWS commented 9 years ago

I think I would prefer default QuarterBegin to be 1

jreback commented 9 years ago

agreed - seeing if anyone know why it is not 1 (and why 3)

nimishgautam commented 9 years ago

Just guessing, but there are a few countries whose fiscal years begin in March (month 3): http://en.wikipedia.org/wiki/Fiscal_year, but even then, the QuarterEnd and BQuarterEnd objects don't default to 3 to match.

rockg commented 9 years ago

I'm inclined to change this default to the standard quarter definition (month begins on 1, 4, 7, 10 and month ends on 3, 6, 9, 12). Any objections?

It is wrong and inconsistent now:

pd.Timestamp('11/2/2012', tz='US/Eastern') + pd.tseries.offsets.QuarterBegin()
Out[29]: Timestamp('2012-12-01 00:00:00-0500', tz='US/Eastern')
pd.Timestamp('11/2/2012', tz='US/Eastern') + pd.tseries.offsets.QuarterEnd()
Out[30]: Timestamp('2012-12-31 00:00:00-0500', tz='US/Eastern')
rockg commented 9 years ago

Related to #5307 probably.

chris-b1 commented 8 years ago

@jreback, I'm running into this issue fixing #11370 - I agree with the comments above that the current definition is inconsistent. But I suppose it would need to go through a deprecation cycle where it's breaking? cc @sinhrks

edit: on second thought I guess QS-DEC does sort of imply the quarter starts in December, so maybe it's better to just change the default for QuarterBegin() than the whole set of semantics like I was thinking originally.


# start / end definition for period is symmetrical
In [43]: pd.period_range('2014-1-1', periods=5, freq='Q-DEC').to_timestamp(how='e')
Out[43]: 
DatetimeIndex(['2014-03-31', '2014-06-30', '2014-09-30', '2014-12-31',
               '2015-03-31'],
              dtype='datetime64[ns]', freq='Q-DEC')

In [44]: pd.period_range('2014-1-1', periods=5, freq='Q-DEC').to_timestamp(how='s')
Out[44]: 
DatetimeIndex(['2014-01-01', '2014-04-01', '2014-07-01', '2014-10-01',
               '2015-01-01'],
              dtype='datetime64[ns]', freq='QS-OCT')

# not so for QuarterBegin / QuarterEnd

In [46]: pd.date_range('2014-1-1', periods=5, freq='Q-DEC')
Out[46]: 
DatetimeIndex(['2014-03-31', '2014-06-30', '2014-09-30', '2014-12-31',
               '2015-03-31'],
              dtype='datetime64[ns]', freq='Q-DEC')

In [47]: pd.date_range('2014-1-1', periods=5, freq='QS-DEC')
Out[47]: 
DatetimeIndex(['2014-03-01', '2014-06-01', '2014-09-01', '2014-12-01',
               '2015-03-01'],
              dtype='datetime64[ns]', freq='QS-DEC')
kawochen commented 8 years ago

Those are months of IMM dates.

jreback commented 8 years ago

as discussed in #14254 this should be changed in 0.20. In order to make this back-compat, I will propose that we show a warning if startingMonth is not specified. Of course this will show the warning for everyone, but I don't see a good alternative to avoid subtle changed behavior, e.g.

QuarterBegin() (starts on 3) -> Quarterbegin() (starts on 1) is quite subtle.

TomAugspurger commented 8 years ago

I will propose that we show a warning if startingMonth is not specified

+1

jreback commented 7 years ago

any appetite for this one?

tdpetrou commented 6 years ago

Also, these offsets seem to be poorly documented. I had no idea there was an option for startingMonth.

TomAugspurger commented 6 years ago

Pushing as not a blocker for 1.0

Darrrian commented 4 years ago

Hi, what is the status of this issue? It really produces highly unexpected behavior, especially in context of documentation and function QuarterEnd - the fact that for Quarter end to have startingMonth = 3 and quarter ending on 31st of March, while Quarter begin gives 1st of March makes no sense together. I believe they should have defualt starting month equal to 1 (as the financial example is surely a minority case), and while quarterEnd by default should return the same value, quarter begin should return 1st of January... If there is any logical reason why it should not work like that, please, let me know, however surely those two functions now return (together) inconsisten values....

jreback commented 4 years ago

@Darrrian happy to take a patch which deprecates - if u read the issue and the related ones there is no objection to the change

davs2rt commented 3 years ago

This is not even self-consistent without startingMonth=1:

for m in range(1,13):
     date = pd.Timestamp(2020, m, 1)
     qdate = date - pd.offsets.QuarterBegin()
     print(date, "is in quarter", date.quarter, "which begins", qdate )
2020-01-01 00:00:00 is in quarter 1 which begins 2019-12-01 00:00:00
2020-02-01 00:00:00 is in quarter 1 which begins 2019-12-01 00:00:00
2020-03-01 00:00:00 is in quarter 1 which begins 2019-12-01 00:00:00  <- 03-01 in Q1, but 
2020-04-01 00:00:00 is in quarter 2 which begins 2020-03-01 00:00:00    Q2 begins on 03-01
2020-05-01 00:00:00 is in quarter 2 which begins 2020-03-01 00:00:00
2020-06-01 00:00:00 is in quarter 2 which begins 2020-03-01 00:00:00
2020-07-01 00:00:00 is in quarter 3 which begins 2020-06-01 00:00:00
2020-08-01 00:00:00 is in quarter 3 which begins 2020-06-01 00:00:00
2020-09-01 00:00:00 is in quarter 3 which begins 2020-06-01 00:00:00
2020-10-01 00:00:00 is in quarter 4 which begins 2020-09-01 00:00:00
2020-11-01 00:00:00 is in quarter 4 which begins 2020-09-01 00:00:00
2020-12-01 00:00:00 is in quarter 4 which begins 2020-09-01 00:00:00
bryanwhiting commented 1 year ago

Still wrong as of Pandas 1.5.3:

        anchor_date = date(2020, 10, 1)
        start_date = (anchor_date + pd.offsets.QuarterBegin(1)).to_pydatetime().date()
        start_date

image

But this works:

ipdb> anchor_date + pd.offsets.QuarterBegin(1, startingMonth=1)
Timestamp('2021-01-01 00:00:00')
ThomasA commented 6 months ago

I am amazed to see that this discussion has been going on for almost 10 years and there still does not seem to be any conclusion to it.
I think of quarters as starting in January, April, July, and October. I can make pandas.tseries.offsets.QuarterBegin behave accordingly by using (tested in Pandas 2.2.1):

for month in range(1,13):
    print(pd.Timestamp(month=month, day=1, year=2024) + pd.tseries.offsets.QuarterBegin(normalize=True, startingMonth=1))

2024-04-01 00:00:00 2024-04-01 00:00:00 2024-04-01 00:00:00 2024-07-01 00:00:00 2024-07-01 00:00:00 2024-07-01 00:00:00 2024-10-01 00:00:00 2024-10-01 00:00:00 2024-10-01 00:00:00 2025-01-01 00:00:00 2025-01-01 00:00:00 2025-01-01 00:00:00

But then I expect pd.tseries.offsets.QuarterBegin to return the ends of the same quarters when used with the same arguments:

for month in range(1,13):
    print(pd.Timestamp(month=month, day=1, year=2024) + pd.tseries.offsets.QuarterEnd(normalize=True, startingMonth=1))

2024-01-31 00:00:00 2024-04-30 00:00:00 2024-04-30 00:00:00 2024-04-30 00:00:00 2024-07-31 00:00:00 2024-07-31 00:00:00 2024-07-31 00:00:00 2024-10-31 00:00:00 2024-10-31 00:00:00 2024-10-31 00:00:00 2025-01-31 00:00:00 2025-01-31 00:00:00

Instead, I have to set startingMonth=3 for pd.tseries.offsets.QuarterEnd to return the ends of the same quarters as pd.tseries.offsets.QuarterBegin with startingMonth=1 and that seems inconsistent to me. I expect pd.tseries.offsets.QuarterBegin and pd.tseries.offsets.QuarterEnd to return, respectively, the beginnings and ends of the same quarters when run with the same arguments:

for month in range(1,13):
    print(pd.Timestamp(month=month, day=1, year=2024) + pd.tseries.offsets.QuarterEnd(normalize=True, startingMonth=3))

2024-03-31 00:00:00 2024-03-31 00:00:00 2024-03-31 00:00:00 2024-06-30 00:00:00 2024-06-30 00:00:00 2024-06-30 00:00:00 2024-09-30 00:00:00 2024-09-30 00:00:00 2024-09-30 00:00:00 2024-12-31 00:00:00 2024-12-31 00:00:00 2024-12-31 00:00:00

MarcoGorelli commented 6 months ago

I expect pd.tseries.offsets.QuarterBegin and pd.tseries.offsets.QuarterEnd to return, respectively, the beginnings and ends of the same quarters when run with the same arguments:

Agree, this is really odd:

In [29]: pandas.tseries.offsets.QuarterBegin().rollback(datetime(2000, 1, 15, 2))
Out[29]: Timestamp('1999-12-01 02:00:00')

In [30]: pandas.tseries.offsets.QuarterEnd().rollforward(datetime(2000, 1, 15, 2))
Out[30]: Timestamp('2000-03-31 02:00:00')

I'd also like to change startingMonth to be 1, but that requires a deprecation cycle to not break everyone's code...

The way forwards may be:

Though aside from the default, QuarterEnd's definition of "end" seems really off. Maybe that can just be changed as a bug fix in 3.0

I'll bring this up in the next call

rt87 commented 1 month ago

Okay, whoever and for whatever reasons decided that 3 would be the best default for startingMonth, I honestly do not care, but this has not been fixed for an entire DECADE. Come on...

MarcoGorelli commented 1 month ago

@rt87 :

Okay, whoever and for whatever reasons decided that 3 would be the best default for startingMonth, I honestly do not care, but this has not been fixed for an entire DECADE. Come on...

People on every historic issue: "this hasn't been fixed in a decade, just change it already!"

People if/when it gets changed: "this broke my code, stop changing things!"

This could be changed as a breaking change in 3.0, but I'm not sure I have the energy to deal with potential backlash

Darrrian commented 1 month ago

@rt87 So maybe fix it instead of being rude... Problem is that startingMonth = 3 is not the problem, the problem is that the 2 functions are incosistent together... @MarcoGorelli Do you consider it code breaking change? Given that the behavior is clearly wrong now - functions QuarterBegin and QuarterEnd work incorrectli... Or maybe it would be possible to "solve" it by creating new consistenent methods, dunno QuarterBeginNew or something haha... It is an annoyance, not a real problem...

rt87 commented 1 month ago

OF COURSE it may break something. Since when is this a reason not to fix things? Such thoughts always remind me of https://xkcd.com/1172/. By that logic, the XKCD heat problem will remain till the end of time... XD.

Also, did you stop to think about all apps that may be working incorrectly beause of this? Past, present and future? Yes, the responsible DEVs should have tested their code, but nonetheless.... So, call me rude all you want, I am sticking with "needs to be fixed!". Not that anyone seems interested in my opinion, but I strongly feel that procrastinating is not the way to go here. And btw I never said "just screw everyone and flip the switch tomorrow". After all, there are means to handle breaking changes. Introduce deprecations etc..., if that would have been implemented said decade ago, this issue would now be solved.

And following that train of thought: Even if I do not really like it, libs DO introduce breaking changes in minor versions. Yes, they shouldn't, but sometimes it is warranted, e.g. for fixing bugs...

I'm obviously not in charge of this, so fix it in v2.x, fix it in v3, or just never fix it at all. Decide as you see fit! This is just my two cents regarding bugs, and I will continue to think (and speak) "Come on..." when known bugs have not been addressed for such a long time.