sdispater / pendulum

Python datetimes made easy
https://pendulum.eustace.io
MIT License
6.12k stars 372 forks source link

in_months() returns wrong values in some cases in January to February #790

Open baubie opened 6 months ago

baubie commented 6 months ago

Issue

We encountered a bug in 2.1 that persists in 3.0 when using the in_months() function of an interval where we get weird results for very specific dates. Examples are shown below.

In a leap year (2024). Why would January 3rd return 1 but the 2rd and 4th return 0?


>>> import pendulum

>>> (pendulum.datetime(2024,2,1,0,0,0, tz='America/Toronto') - pendulum.datetime(2024,1,1,0,0,0, tz='America/Toronto')).in_months()
1
>>> (pendulum.datetime(2024,2,1,0,0,0, tz='America/Toronto') - pendulum.datetime(2024,1,2,0,0,0, tz='America/Toronto')).in_months()
0 # Why is this zero but January 1st and 3rd are 1?
>>> (pendulum.datetime(2024,2,1,0,0,0, tz='America/Toronto') - pendulum.datetime(2024,1,3,0,0,0, tz='America/Toronto')).in_months()
1
>>> (pendulum.datetime(2024,2,1,0,0,0, tz='America/Toronto') - pendulum.datetime(2024,1,4,0,0,0, tz='America/Toronto')).in_months()
0
>>> (pendulum.datetime(2024,2,1,0,0,0, tz='America/Toronto') - pendulum.datetime(2024,1,5,0,0,0, tz='America/Toronto')).in_months()
0

Not a leap year. Why would January 4th return 1 but the 3rd and 5th return 0?

>>> (pendulum.datetime(2023,2,1,0,0,0, tz='America/Toronto') - pendulum.datetime(2023,1,1,0,0,0, tz='America/Toronto')).in_months()
1
>>> (pendulum.datetime(2023,2,1,0,0,0, tz='America/Toronto') - pendulum.datetime(2023,1,2,0,0,0, tz='America/Toronto')).in_months()
0
>>> (pendulum.datetime(2023,2,1,0,0,0, tz='America/Toronto') - pendulum.datetime(2023,1,3,0,0,0, tz='America/Toronto')).in_months()
0
>>> (pendulum.datetime(2023,2,1,0,0,0, tz='America/Toronto') - pendulum.datetime(2023,1,4,0,0,0, tz='America/Toronto')).in_months()
1
>>> (pendulum.datetime(2023,2,1,0,0,0, tz='America/Toronto') - pendulum.datetime(2023,1,5,0,0,0, tz='America/Toronto')).in_months()
0
>>> (pendulum.datetime(2023,2,1,0,0,0, tz='America/Toronto') - pendulum.datetime(2023,1,6,0,0,0, tz='America/Toronto')).in_months()
0
Benji19967 commented 4 months ago

@sdispater @Secrus before we look for potential changes in the code, it should be clear what in_months() is supposed to return. If you want to skip straight to what I think is the best solution, you can jump to the TL;DR at the bottom.

The output is what precise_diff in _helpers returns, which is what in_months() uses. Read Interval in the tables below as end - start.

From the issue above:

Interval Current Desired Is Correct
Feb 1 - Jan 1 1 months 0 days 1 months 0 days Yes
Feb 1 - Jan 2 0 months 30 days 0 months 30 days Yes
Feb 1 - Jan 3 0 months 29 days 0 months 29 days Yes
Feb 1 - Jan 4 1 months 0 days 0 months 28 days No
Feb 1 - Jan 5 0 months 27 days 0 months 30 days Yes

This example should clarify the issue (let's assume we're dealing with a non-leap year):

Interval Current Desired
Feb 28 - Jan 28 1 months 0 days 1 months 0 days
Mar 1 - Feb 1 1 months 0 days 1 months 0 days

So far so good. But what pendulum currently returns for the following examples seems inconsistent.

Interval Current Desired
Mar 1 - Jan 28 1 months 1 days 1 months 4 days
Mar 1 - Jan 29 1 months 1 days 1 months 3 days
Mar 1 - Jan 30 1 months 1 days 1 months 2 days
Mar 1 - Jan 31 1 months 1 days 1 months 1 days

The Desired column above represents an alternative. But then we would run into the following:

Interval Desired
Feb 28 - Jan 28 1 months 0 days
Mar 1 - Jan 29 1 months 3 days

By increasing the start and end of the interval by 1 day each, the number of days gets bumped by 3.

Clearly, it's not straightforward to establish what the output should be—all solutions appear to yield non-intuitive results.

TL;DR

As Dijkstra advocated, the best solution appears to be to use a semi-open interval [A, B), where A is inclusive and B exclusive.

Interval Output
Feb 1 - Jan 1 0 months 30 days 23 hours 59 minutes 59 seconds 999999 microseconds
Feb 1 - Jan 2 0 months 29 days 23 hours 59 minutes 59 seconds 999999 microseconds
Feb 1 - Jan 3 0 months 28 days 23 hours 59 minutes 59 seconds 999999 microseconds
Feb 1 - Jan 4 0 months 27 days 23 hours 59 minutes 59 seconds 999999 microseconds
Feb 1 - Jan 5 0 months 26 days 23 hours 59 minutes 59 seconds 999999 microseconds
Feb 28 - Jan 28 0 months 30 days 23 hours 59 minutes 59 seconds 999999 microseconds
Mar 1 - Jan 28 1 months 3 days 23 hours 59 minutes 59 seconds 999999 microseconds
Mar 1 - Jan 29 1 months 2 days 23 hours 59 minutes 59 seconds 999999 microseconds
Mar 1 - Jan 30 1 months 1 days 23 hours 59 minutes 59 seconds 999999 microseconds
Mar 1 - Jan 31 1 months 0 days 23 hours 59 minutes 59 seconds 999999 microseconds
Mar 1 - Feb 1 0 months 27 days 23 hours 59 minutes 59 seconds 999999 microseconds

There is only ever a month if there is a full month between end and start.

The output is intuitive, consistent and hence also easy to document. I think using a semi-open interval should be the de facto choice for diffs/durations/ranges/intervals.

Related issue: https://github.com/sdispater/pendulum/issues/606