Open 926c91cc-ad75-44d8-8753-2e15a016bea4 opened 4 years ago
Python 3.7 gained support for parsing ISO 8601 formatted time, date and datetime strings via the fromisoformat() methods. Python has seen improved support for ISO 8601 in general; ISO calendar format codes were added in Python 3.6, and fromisocalendar() was added in Python 3.8.
ISO 8601 also has a standard for durations: https://en.wikipedia.org/wiki/ISO_8601#Durations
For consistency with the other objects in the datetime module, I suggest adding isoformat()/fromisoformat() methods for datetime.timedelta that implement ISO 8601 durations.
ISO 8601 durations support years and months that are not valid timedelta arguments because they are non-precise durations. I suggest throwing an exception if the conversion to or from timedelta cannot be done safely.
https://pypi.org/project/isodate/ implements a parse_duration() method that could be used for inspiration.
Among other things, ISO 8601 duration strings are commonly used to communicate offset values in timezone definitions.
There is related discussion in bpo-41254, about duration formats more generally.
This is probably more feasible than the proposal in bpo-41254 since it's a well-defined spec (mostly — it includes an optional alternative format and the number of digits allowed is defined "by agreement", thus defeating the purpose of using a spec in the first place) that's not even particularly difficult to implement, but there are still a few problems (and one reason I've never implemented this, despite desperately wanting a better string representation for time deltas). Two minor problems first:
Unlike ISO 8601 datetimes, these are not especially "human-friendly" formats, so I don't think they're especially useful for displaying timedeltas.
Also unlike ISO 8601 datetimes, I don't think these are in particularly wide use, or widely supported. That's not a major strike against it, but if it's not useful as something to show to humans and it's not especially useful as something to show to / read from other computers, that weighs against its inclusion in the standard library.
The biggest problem, however, is that timedelta
does not and cannot represent "Year" or "Month", which means that P1Y
or P1M
would always need to be invalid to parse. We could eliminate this format, but it means that we would never at any point in the future be able to implement a parser for the full spec. Since the concept of a year and a month are ambiguous and at least the 2016 version of ISO 8601 doesn't seem to define what it means for a duration to last 1 year or 1 month, you can't even really count on such a thing as an interchange format, because different implementations might give you different results! What does 20200131T00:00:00/P1M
represent? The interval (2020-01-31, 2020-02-29)? (2020-01-31, 2020-03-02)? Something else?
A better target for parsing ISO 8601 durations would be something like dateutil.relativedelta
, which does have defined semantics for years and months (though as I mentioned above, those are not necessarily consistent with the semantics of other libraries parsing or writing out this format).
I am also not entirely clear on whether "weeks" is just an alias for "7 days" or if it means something related to weeks in the ISO calendar (and if that makes a difference for durations).
I imagine that generating these formats is a bit more forgiving, because you would simply never generate the forbidden formats, and we can offer configuration options in the formatter method to allow the user to tweak the various ambiguities in the spec.
There are two conflicting interests: ISO 8601 that allows non-precise durations, and timedelta that assumes precise durations.
For me, the non-precise durations only make sense in date arithmetic - to a human, it's pretty clear what adding 3 months or a year will do to the date. There may be edge cases when crossing DST, but normal arithmetic with timezone also have those cases.
Regarding ISO weeks, I'm pretty sure that they are only special in regards to calculating week numbers and the weekday they start. They still have a duration of 7 days.
Apart from being able to parse ISO durations coming from other systems, the non-precise durations would be useful e.g. when implementing recurring events. Calculating a series of dates for something that happens on the 12th day of every 2nd month is doable in Python, but not with the aid of timedelta.
I see four options here:
1) expand timedelta to allow month and year, with the implication that e.g. total_seconds() would fail or be ambiguous for these timedeltas
2) implement only the parts of ISO 8601 that can safely be represented by the current timedelta
3) add a new relativetimedelta class that allows representing non-precise durations
4) do nothing and leave it to 3rd party packages to implement this
- implement only the parts of ISO 8601 that can safely be represented by the current timedelta
After learning about this ticket, I've attempted an implementation of timedelta.fromisoformat
and timedelta.isoformat
in a library called timestamp-iso8601
(published on PyPi and GitHub).
It's freshly-prepared and unreviewed so far and I'd welcome any feedback on it.
The library provides a subclass of datetime.timedelta
that can be used as a drop-in replacement to parse and serialize ISO 8601 durations.
The library has no external dependencies and has been developed with performance in mind, albeit not as the primary goal. Test coverage is included in the source repository.
- expand timedelta to allow month and year, with the implication that e.g. total_seconds() would fail or be ambiguous for these timedeltas
The library has some limitations, and absence of support for representation of months
and years
in datetime.timedelta
objects certainly affects it. The code is designed to be forwards-compatible so that construction of year-aware and month-aware durations would activate if-and-when supported by datetime.timedelta
.
There are two conflicting interests: ISO 8601 that allows non-precise durations, and timedelta that assumes precise durations.
Go's time.ParseDuration
supports units from ns to h, and strings such as "300ms", "-1.5h" or "2h45m".
Java differentiates between time-durations implemented as java.time.Durations
and date-durations implemented as java.time.Period
. The former stores durations in terms of seconds and nanoseconds, and parses from units ns to h; in addition, days can be parsed as standard 24 hour days. Durations.parse(...)
is implemented using a regular expression defined in https://github.com/openjdk/jdk/blob/1bfcc2790adbc273864c74faab0bd43613c75982/src/java.base/share/classes/java/time/Duration.java#L154-L157
.NET uses a similar concept as TimeSpan
, but parses from a different syntax.
After learning about this ticket, I've attempted an implementation of
timedelta.fromisoformat
andtimedelta.isoformat
in a library calledtimestamp-iso8601
(published on PyPi and GitHub).
My apologies here: the license terms that this library is currently under may have caused a license violation, and so I plan to yank the PyPi libraries and make the GitHub repository private until questions about those can be resolved.
For reference in the absence of @jayaddison's code, here is an implementation of fromisoformat
and isoformat
that I wrote: https://gist.github.com/benkehoe/5b03c308b038b29e42106f602e554010
I believe strongly that timedelta
deserves parse/format methods, but I can see the problems with not supporting the full ISO spec (in my code, I parse years and months and raise an exception about lack of support, and in the docs clarify that e.g. P1DT12H is treated identically to PT36H). The counterbalance is that while, say, Go's solution is a good alternative in isolation, it means its parse/format methods are named and work different from the rest of the datetime
classes.
Apologies for what might be a slightly repetitive message here, but: given some concerns about the timedelta-iso8601
library, which repurposed a couple of method signatures and docstrings from cpython.git
, and because I wanted to get this functionality back out there, I've re-implemented the same functionality from nowt in a clean environment without looking at or copying any code from cpython.git
.
The updated library is available under an AGPLv3 license in source form as timedelta-isoformat
on GitHub and packaged as a wheel named timedelta-isoformat
on PyPi.
@benkehoe any chance you could re-run your benchmark comparison against timedelta-isoformat
v0.4.1?
As a heads-up for anyone following along (and please speak up if this is noise - I'll adjust and find a better way to communicate): timedelta-isoformat
v0.4.1 remains available on GitHub, but is deprecated.
In particular, two important bugs have been addressed since that version:
366 days
was configured within date-segment elements, and 59 seconds
within time-segment elements -- but the same logic was used to evaluate both. That's incorrect: in the date-segment context, 366
is an inclusive-range-limit, whereas in the time-segment context, anything up-to 60
exclusive-range-limit is acceptable.S
) component of serialized results incorrectly relied on some string-formatting defaults, allowing the value contained within the corresponding element to be presented in scientific notation -- that's not valid within ISO-8601 designator-separated fields, as far as I'm awareAdditionally: an intentional decision was made to handle all parsing of values-to-numbers using the float
type. Although in practice many duration strings communicated are likely to contain short values (<= 10 digits
), there is apparently no known linear-time algorithm to transform decimal strings into base-ten integers.. so let's be on the safe side and use float
parsing (until such time as a radical rethink of that policy is required -- or someone knows better and can share that).
Java differentiates between time-durations implemented as
java.time.Durations
and date-durations implemented asjava.time.Period
Here's a Python implementation for fromisoformat
in 25 LOC plus some unit tests: https://gist.github.com/simon04/90ad63486022fd110e5aea58e8ecb411
I don't think we need any more implementations. The implementation here was never the problem. The big unaddressed issues are about who wants this thing and why.
If people want a human-friendly way to print timedelta
, this isn't it. If people want to be able to parse arbitrary ISO8601 durations, timedelta
is not the right output type. Is there a real use case for this? If not, we should work on solving the problems people have rather than creating something that almost works.
Is there a real use case for this?
There definitively is! To parse delays, timeouts, lifetimes. The project https://github.com/caddyserver/caddy (not Python) has 35 usages of ParseDuration
. A codebase of mine has various ad-hoc implementations and would benefit from a datetime.timedelta.fromisoformat
--
datetime.timedelta(days=int(match_days.group("days")))
datetime.timedelta(hours=int(match_hours.group("hours")))
datetime.timedelta(seconds=ini.getint(section, "min_mod_diff"))
datetime.timedelta(days=float(ini.get(section, "days")))
datetime.timedelta(hours=float(ini.get(section, "hours")))
datetime.timedelta(days=int(ini.get(section, "DAYS")))
datetime.timedelta(seconds=ini.getint(section, "MaxAgeSeconds"))
datetime.timedelta(seconds=ini.getint(section, "MinAgeSeconds"))
datetime.timedelta(days=int(match_days.group("days")))
datetime.timedelta(hours=int(match_hours.group("hours")))
datetime.timedelta(seconds=ini.getint(section, "min_mod_diff"))
datetime.timedelta(days=float(ini.get(section, "days")))
datetime.timedelta(hours=float(ini.get(section, "hours")))
datetime.timedelta(days=int(ini.get(section, "DAYS")))
datetime.timedelta(seconds=ini.getint(section, "MaxAgeSeconds"))
datetime.timedelta(seconds=ini.getint(section, "MinAgeSeconds"))
Thanks!
There definitively is! To parse delays, timeouts, lifetimes.
Why do these need to be ISO 8601 durations rather than some other, better format?
ISO 8601 is not a strict necessity here, but a handy standard that can be used. Also for symmetry with datetime.datetime.fromisoformat
Is Go's syntax is preferrable?
Go's
time.ParseDuration
supports units from ns to h, and strings such as "300ms", "-1.5h" or "2h45m".
Note, some other notable libraries that try to do this such as Pandas. See https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.isoformat.html.
I've also used isodate quite extensively in lieu of dateutil.relativedelta
.
This would be a welcome addition to the standard library.
The big unaddressed issues are about who wants this thing and why.
@pganssle I think the number of libraries trying to accomplish the same thing (albeit with different trade-offs) is a signal to the desire.
From my experience I usually see a mix of these libraries being used to parse durations in business/data-science applications (e.g. ML generated input/output or more recent LLM input/output). For example, LLM applications can be better at generating these type of durations from natural language input versus full datetimes, hence it's usefulness.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['library', '3.10']
title = 'isoformat() / fromisoformat() for datetime.timedelta'
updated_at =
user = 'https://bugs.python.org/ErikCederstrand'
```
bugs.python.org fields:
```python
activity =
actor = 'Erik Cederstrand'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'Erik Cederstrand'
dependencies = []
files = []
hgrepos = []
issue_num = 42094
keywords = []
message_count = 5.0
messages = ['379091', '379096', '379097', '381273', '381314']
nosy_count = 3.0
nosy_names = ['martin.panter', 'Erik Cederstrand', 'p-ganssle']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue42094'
versions = ['Python 3.10']
```