gweis / isodate

ISO 8601 date/time parser
BSD 3-Clause "New" or "Revised" License
148 stars 58 forks source link

isodate.parse_datetime incorrectly handles Years, Months and Weeks #42

Closed dhrp closed 2 years ago

dhrp commented 7 years ago

https://en.wikipedia.org/wiki/ISO_8601#Durations

See the following snippet:

>>> s = "P12YT0H0M"
>>> str(isodate.parse_duration(s))
'12 years, 0:00:00'
>>> isodate.parse_duration(s).total_seconds()
0.0

The library incorrectly places the larger increments in the object, as can be seen here:

>>> isodate.parse_duration(s)
isodate.duration.Duration(0, 0, 0, years=12, months=0)

While in code here: https://github.com/gweis/isodate/blob/master/src/isodate/isoduration.py#L92 it is actually clear that the author chose to output another other format when a year or months is sent, but I argue this behaviour is wrong.

TL;DR I suggest always converting to timedelta.

dhrp commented 7 years ago

Alternatively, for people that would like to keep this special Duration type the function could be something like:

def parse_duration(datestring, timedelta=True):   

Where the default is to always output a safe timedelta, and optionally a Duration. Or possibly more backwards-compatible, introduce:

def parse_duration_to_timedelta(datestring)

I can make a PR for this.

//cc @lmazuel, @gweiss

Oh, and I thought about the 'Months' thing (how many days?). And I think it should just throw a ValidationError when Month is included on a 'to timedelta'

shendric commented 7 years ago

My personal preferrence would be to use relativedeltafrom dateutil instead of timedelta. It comes with years, month, ... and functionality like

>>> relativedelta(days=1.5, hours=2).normalized()
relativedelta(days=1, hours=14)

could be useful in this context.

lmazuel commented 7 years ago

Several things:

reinhrst commented 6 years ago

@dhrp how would you define a month in a datetime.timedelta object? The datetime.timedelta object can only contain days or seconds (and it assumes 24 3600 seconds in a day, while some days have 25 or 23 hours, and some days have leap-seconds), not a concept of Month. A Month may be anywhere between 28 and 31 days, having 28 24 3600 seconds as minimum, and (30 24 + 25) * 3600 seconds as maximum (as last October in places where the DST ended).

dhrp commented 6 years ago

@reinhrst: as I mentioned above:

And I think it should just throw a ValidationError when Month is included on a 'to timedelta'

dhrp commented 6 years ago

And this is the function that I wrote to go from seconds to ISO duration. It is naive in the sense of how many days are in a year.

def seconds_to_isostr(total_seconds):
    """
    Converts a duration in seconds to an ISO 8601 datestring in the
    format 'PYMDTHMS'
    """

    s = Decimal(total_seconds)
    years, less_then_y = divmod(s, 3600*24*365)
    days, less_then_d = divmod(less_then_y, 3600*24)
    hours, less_then_h = divmod(less_then_d, 3600)
    minutes, seconds = divmod(less_then_h, 60)

    y = str(years) + "Y" if years else ""
    d = str(days) + "D" if days else ""
    h = str(hours) + "H" if hours else ""
    m = str(minutes) + "M" if minutes else ""
    s = str(seconds) + "S" if seconds else ""
    t = "T" if (h or m or s) else ""

    iso_string = "P{y}{d}{t}{h}{m}{s}".format(y=y, d=d, t=t, h=h, m=m, s=s)
    return iso_string
reinhrst commented 6 years ago

Ah, sorry, didn't see that.

Actually same goes for days then. And, technically, for hours (since some hours have 3601 seconds, however math with them works).

Actually I guess the main problem is the python datetime module, which mangles up timezone-aware and non-aware dates into 1 object, and not having a clear representation of duration.

For an API change, I would actually prefer to have it always return "Duration", where a Duration object has a to_timedelta(), which may or may not raise (possibly with an extra parameters assume_24_hour_days=True and assume_average_month_length=True, where average month length = (average number of seconds in a year / 12).... but all that would be icing on the cake).

Flix6x commented 3 years ago

I tried my hand at a partial solution in this PR by making a small step towards @reinhrst's suggestion (to always return a Duration object) by at least making that a possibility. I realize this is the exact opposite of the @dhrp's suggestion, but imo datetime.timedelta objects are irreconcilable with nominal durations in ISO 8601.

ps In case someone is confused by the issue title: this issue relates to parse_duration (not parse_datetime).

gweis commented 2 years ago

I consider this fixed thanks to @Flix6x PR.