python / cpython

The Python programming language
https://www.python.org
Other
63.44k stars 30.38k forks source link

isoformat() / fromisoformat() for datetime.timedelta #86260

Open 926c91cc-ad75-44d8-8753-2e15a016bea4 opened 4 years ago

926c91cc-ad75-44d8-8753-2e15a016bea4 commented 4 years ago
BPO 42094
Nosy @vadmium, @pganssle

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['library', '3.10'] title = 'isoformat() / fromisoformat() for datetime.timedelta' updated_at = user = 'https://bugs.python.org/ErikCederstrand' ``` bugs.python.org fields: ```python activity = actor = 'Erik Cederstrand' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'Erik Cederstrand' dependencies = [] files = [] hgrepos = [] issue_num = 42094 keywords = [] message_count = 5.0 messages = ['379091', '379096', '379097', '381273', '381314'] nosy_count = 3.0 nosy_names = ['martin.panter', 'Erik Cederstrand', 'p-ganssle'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = None url = 'https://bugs.python.org/issue42094' versions = ['Python 3.10'] ```

926c91cc-ad75-44d8-8753-2e15a016bea4 commented 4 years ago

Python 3.7 gained support for parsing ISO 8601 formatted time, date and datetime strings via the fromisoformat() methods. Python has seen improved support for ISO 8601 in general; ISO calendar format codes were added in Python 3.6, and fromisocalendar() was added in Python 3.8.

ISO 8601 also has a standard for durations: https://en.wikipedia.org/wiki/ISO_8601#Durations

For consistency with the other objects in the datetime module, I suggest adding isoformat()/fromisoformat() methods for datetime.timedelta that implement ISO 8601 durations.

ISO 8601 durations support years and months that are not valid timedelta arguments because they are non-precise durations. I suggest throwing an exception if the conversion to or from timedelta cannot be done safely.

https://pypi.org/project/isodate/ implements a parse_duration() method that could be used for inspiration.

926c91cc-ad75-44d8-8753-2e15a016bea4 commented 4 years ago

Among other things, ISO 8601 duration strings are commonly used to communicate offset values in timezone definitions.

vadmium commented 4 years ago

There is related discussion in bpo-41254, about duration formats more generally.

pganssle commented 3 years ago

This is probably more feasible than the proposal in bpo-41254 since it's a well-defined spec (mostly — it includes an optional alternative format and the number of digits allowed is defined "by agreement", thus defeating the purpose of using a spec in the first place) that's not even particularly difficult to implement, but there are still a few problems (and one reason I've never implemented this, despite desperately wanting a better string representation for time deltas). Two minor problems first:

  1. Unlike ISO 8601 datetimes, these are not especially "human-friendly" formats, so I don't think they're especially useful for displaying timedeltas.

  2. Also unlike ISO 8601 datetimes, I don't think these are in particularly wide use, or widely supported. That's not a major strike against it, but if it's not useful as something to show to humans and it's not especially useful as something to show to / read from other computers, that weighs against its inclusion in the standard library.

The biggest problem, however, is that timedelta does not and cannot represent "Year" or "Month", which means that P1Y or P1M would always need to be invalid to parse. We could eliminate this format, but it means that we would never at any point in the future be able to implement a parser for the full spec. Since the concept of a year and a month are ambiguous and at least the 2016 version of ISO 8601 doesn't seem to define what it means for a duration to last 1 year or 1 month, you can't even really count on such a thing as an interchange format, because different implementations might give you different results! What does 20200131T00:00:00/P1M represent? The interval (2020-01-31, 2020-02-29)? (2020-01-31, 2020-03-02)? Something else?

A better target for parsing ISO 8601 durations would be something like dateutil.relativedelta, which does have defined semantics for years and months (though as I mentioned above, those are not necessarily consistent with the semantics of other libraries parsing or writing out this format).

I am also not entirely clear on whether "weeks" is just an alias for "7 days" or if it means something related to weeks in the ISO calendar (and if that makes a difference for durations).

I imagine that generating these formats is a bit more forgiving, because you would simply never generate the forbidden formats, and we can offer configuration options in the formatter method to allow the user to tweak the various ambiguities in the spec.

926c91cc-ad75-44d8-8753-2e15a016bea4 commented 3 years ago

There are two conflicting interests: ISO 8601 that allows non-precise durations, and timedelta that assumes precise durations.

For me, the non-precise durations only make sense in date arithmetic - to a human, it's pretty clear what adding 3 months or a year will do to the date. There may be edge cases when crossing DST, but normal arithmetic with timezone also have those cases.

Regarding ISO weeks, I'm pretty sure that they are only special in regards to calculating week numbers and the weekday they start. They still have a duration of 7 days.

Apart from being able to parse ISO durations coming from other systems, the non-precise durations would be useful e.g. when implementing recurring events. Calculating a series of dates for something that happens on the 12th day of every 2nd month is doable in Python, but not with the aid of timedelta.

I see four options here:

1) expand timedelta to allow month and year, with the implication that e.g. total_seconds() would fail or be ambiguous for these timedeltas

2) implement only the parts of ISO 8601 that can safely be represented by the current timedelta

3) add a new relativetimedelta class that allows representing non-precise durations

4) do nothing and leave it to 3rd party packages to implement this

jayaddison commented 2 years ago
  1. implement only the parts of ISO 8601 that can safely be represented by the current timedelta

After learning about this ticket, I've attempted an implementation of timedelta.fromisoformat and timedelta.isoformat in a library called timestamp-iso8601 (published on PyPi and GitHub).

It's freshly-prepared and unreviewed so far and I'd welcome any feedback on it.

The library provides a subclass of datetime.timedelta that can be used as a drop-in replacement to parse and serialize ISO 8601 durations.

The library has no external dependencies and has been developed with performance in mind, albeit not as the primary goal. Test coverage is included in the source repository.

  1. expand timedelta to allow month and year, with the implication that e.g. total_seconds() would fail or be ambiguous for these timedeltas

The library has some limitations, and absence of support for representation of months and years in datetime.timedelta objects certainly affects it. The code is designed to be forwards-compatible so that construction of year-aware and month-aware durations would activate if-and-when supported by datetime.timedelta.

simon04 commented 2 years ago

There are two conflicting interests: ISO 8601 that allows non-precise durations, and timedelta that assumes precise durations.

Go's time.ParseDuration supports units from ns to h, and strings such as "300ms", "-1.5h" or "2h45m".

Java differentiates between time-durations implemented as java.time.Durations and date-durations implemented as java.time.Period. The former stores durations in terms of seconds and nanoseconds, and parses from units ns to h; in addition, days can be parsed as standard 24 hour days. Durations.parse(...) is implemented using a regular expression defined in https://github.com/openjdk/jdk/blob/1bfcc2790adbc273864c74faab0bd43613c75982/src/java.base/share/classes/java/time/Duration.java#L154-L157

.NET uses a similar concept as TimeSpan, but parses from a different syntax.

jayaddison commented 2 years ago

After learning about this ticket, I've attempted an implementation of timedelta.fromisoformat and timedelta.isoformat in a library called timestamp-iso8601 (published on PyPi and GitHub).

My apologies here: the license terms that this library is currently under may have caused a license violation, and so I plan to yank the PyPi libraries and make the GitHub repository private until questions about those can be resolved.

benkehoe commented 2 years ago

For reference in the absence of @jayaddison's code, here is an implementation of fromisoformat and isoformat that I wrote: https://gist.github.com/benkehoe/5b03c308b038b29e42106f602e554010

I believe strongly that timedelta deserves parse/format methods, but I can see the problems with not supporting the full ISO spec (in my code, I parse years and months and raise an exception about lack of support, and in the docs clarify that e.g. P1DT12H is treated identically to PT36H). The counterbalance is that while, say, Go's solution is a good alternative in isolation, it means its parse/format methods are named and work different from the rest of the datetime classes.

jayaddison commented 1 year ago

Apologies for what might be a slightly repetitive message here, but: given some concerns about the timedelta-iso8601 library, which repurposed a couple of method signatures and docstrings from cpython.git, and because I wanted to get this functionality back out there, I've re-implemented the same functionality from nowt in a clean environment without looking at or copying any code from cpython.git.

The updated library is available under an AGPLv3 license in source form as timedelta-isoformat on GitHub and packaged as a wheel named timedelta-isoformat on PyPi.

jayaddison commented 1 year ago

@benkehoe any chance you could re-run your benchmark comparison against timedelta-isoformat v0.4.1?

jayaddison commented 1 year ago

As a heads-up for anyone following along (and please speak up if this is noise - I'll adjust and find a better way to communicate): timedelta-isoformat v0.4.1 remains available on GitHub, but is deprecated.

In particular, two important bugs have been addressed since that version:

Additionally: an intentional decision was made to handle all parsing of values-to-numbers using the float type. Although in practice many duration strings communicated are likely to contain short values (<= 10 digits), there is apparently no known linear-time algorithm to transform decimal strings into base-ten integers.. so let's be on the safe side and use float parsing (until such time as a radical rethink of that policy is required -- or someone knows better and can share that).

simon04 commented 1 year ago

Java differentiates between time-durations implemented as java.time.Durations and date-durations implemented as java.time.Period

Here's a Python implementation for fromisoformat in 25 LOC plus some unit tests: https://gist.github.com/simon04/90ad63486022fd110e5aea58e8ecb411

pganssle commented 1 year ago

I don't think we need any more implementations. The implementation here was never the problem. The big unaddressed issues are about who wants this thing and why.

If people want a human-friendly way to print timedelta, this isn't it. If people want to be able to parse arbitrary ISO8601 durations, timedelta is not the right output type. Is there a real use case for this? If not, we should work on solving the problems people have rather than creating something that almost works.

simon04 commented 1 year ago

Is there a real use case for this?

There definitively is! To parse delays, timeouts, lifetimes. The project https://github.com/caddyserver/caddy (not Python) has 35 usages of ParseDuration. A codebase of mine has various ad-hoc implementations and would benefit from a datetime.timedelta.fromisoformat --

datetime.timedelta(days=int(match_days.group("days")))
datetime.timedelta(hours=int(match_hours.group("hours")))
datetime.timedelta(seconds=ini.getint(section, "min_mod_diff"))
datetime.timedelta(days=float(ini.get(section, "days")))
datetime.timedelta(hours=float(ini.get(section, "hours")))
datetime.timedelta(days=int(ini.get(section, "DAYS")))
datetime.timedelta(seconds=ini.getint(section, "MaxAgeSeconds"))
datetime.timedelta(seconds=ini.getint(section, "MinAgeSeconds"))
datetime.timedelta(days=int(match_days.group("days")))
datetime.timedelta(hours=int(match_hours.group("hours")))
datetime.timedelta(seconds=ini.getint(section, "min_mod_diff"))
datetime.timedelta(days=float(ini.get(section, "days")))
datetime.timedelta(hours=float(ini.get(section, "hours")))
datetime.timedelta(days=int(ini.get(section, "DAYS")))
datetime.timedelta(seconds=ini.getint(section, "MaxAgeSeconds"))
datetime.timedelta(seconds=ini.getint(section, "MinAgeSeconds"))

Thanks!

pganssle commented 1 year ago

There definitively is! To parse delays, timeouts, lifetimes.

Why do these need to be ISO 8601 durations rather than some other, better format?

simon04 commented 1 year ago

ISO 8601 is not a strict necessity here, but a handy standard that can be used. Also for symmetry with datetime.datetime.fromisoformat

Is Go's syntax is preferrable?

Go's time.ParseDuration supports units from ns to h, and strings such as "300ms", "-1.5h" or "2h45m".

davetapley commented 11 months ago

See also:

samypr100 commented 8 months ago

Note, some other notable libraries that try to do this such as Pandas. See https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.isoformat.html.

I've also used isodate quite extensively in lieu of dateutil.relativedelta.

This would be a welcome addition to the standard library.

samypr100 commented 8 months ago

The big unaddressed issues are about who wants this thing and why.

@pganssle I think the number of libraries trying to accomplish the same thing (albeit with different trade-offs) is a signal to the desire.

From my experience I usually see a mix of these libraries being used to parse durations in business/data-science applications (e.g. ML generated input/output or more recent LLM input/output). For example, LLM applications can be better at generating these type of durations from natural language input versus full datetimes, hence it's usefulness.