Closed 11ecd6cf-9a95-40f1-8f6c-6275193baf6b closed 6 years ago
The datetime module has support for output to a string of dates and times in ISO 8601 format ("2012-09-09T18:00:00-07:00"), with the object method "isoformat([sep])". But there's no support for parsing such strings. A string to datetime class method should be provided, one capable of parsing at least the RFC 3339 subset of ISO 8601.
The problem is parsing time zone information correctly. The allowed formats for time zone are empty - no TZ, date/time is "naive" in the datetime sense Z - zero, or Zulu time, i.e. UTC. [+-]nn.nn - offset from UTC
"strptime" does not understand timezone offsets. The "datetime" documentation suggests that the "z" format directive handles time zone info, but that's not actually implemented for input.
Pypi has four modules for parsing ISO 8601 dates. Each has least one major problem in time zone handling:
iso8601 0.1.4
Abandonware. Mishandles time zone when time zone is "Z" and
the default time zone is specified.
iso8601.py 0.1dev
Always returns a "naive" datetime object, even if zone specified.
iso8601plus 0.1.6
Fork of abandonware version above. Same bug.
zc.iso8601 0.2.0
Zope version. Imports the pytz module with the full Olsen time zone
database, but doesn't actually use that database.
Thus, nothing in Pypi provides a good alternative.
It would be appropriate to handle this in the datetime module. One small, correct, tested function would be better than the existing five bad alternatives.
%z format is supported, but it cannot accept colon in TZ offset. It can parse offsets like -0600 just fine. What OP is looking for is the GNU date %:z format which datetime does not support.
For ISO 8601 compliance, however I think we need a way to specify a parser that will accept any valid 8601 format: with T or space separator and with or without : in time and timezone and with or without dashes in date.
I would very much like such promiscuous parser to be implemented in datetime.__new__. So that we can create datetime objects from strings the way we do it with numbers.
Re: "%z format is supported".
That's platform-specific; the actual parsing is delegated to the C library. It's not in Python 2.7 / Win32:
ValueError: 'z' is a bad directive in format '%Y-%m-%dT%H:%M:%S%z'
It really shouldn't be platform-specific; the underlying platform is irrelevant to this task. That's more of a documentation error; the features not common to all supported Python platforms should not be mentioned in the documentation.
Re: "I would very much like such promiscuous parser to be implemented in datetime.__new__. "
For string input, it's probably better to do this conversion in a specific class-level function. Full ISO 8601 dates/times generally come from computer-generated data via a file or API. If invalid text shows up, it should be detected as an error, not be heuristically interpreted as a date. There's already "fromtimestamp" and "fromordinal", and "isoformat" as an instance method, so "fromisoformat" seems reasonable.
I'd also suggest providing a standard subclass of tzinfo in datetime for fixed offsets. That's needed to express the time zone information in an ISO 8601 date. The new "fromisoformat" would convert an ISO 8601 date/time would be convertible to a time-zone "aware" datetime object. If converted back to an ISO 8601 string with .isoformat(), the round trip should preserve the original data, including time zone offset.
(Several more implementations of this conversion have turned up. In addition to the four already mentioned, there was one in xml.util, and one in feedparser. There are probably more yet to be found.)
On Thu, Sep 6, 2012 at 9:51 PM, John Nagle \report@bugs.python.org\ wrote:
It's not in Python 2.7 / Win32.
Python 2.x series is closed and cannot accept new features. Both %z and fixed offset tzinfo subclass are implemented in 3.2.
I am attaching a quick python only prototype for the proposed feature. My goal is to make date/time objects behave like numeric types for which constructors accept strings produced by str(). Since str() format is ISO 8601, it is natural to accept ISO 8601 formats in constructors.
We need to define the scope of what input strings will be accepted. ISO-8601 defines a lot of stuff which we may not wish to accept.
Do we want to accept both basic format (YYYYMMDD) and extended format (YYYY-MM-DD)?
Do we want to accept things like "1985-W15-5", which is (if I understand this correctly(), the 5th day of the 15th week of 1985 [section 4.1.4.2].
Do we want to accept [section 4.2.2.4], "23:20,8", which is 23 hours, 20 minutes, 8 tenths of a minute.
I suspect most people who have been following the recent thread (https://groups.google.com/d/topic/comp.lang.python/Q2w4R89Nq1w/discussion) would say none of the above are needed. All that's needed is if you have an existing datetime object, d1, you can do:
s = str(d1)
d2 = datetime.datetime(s)
assert d1 == d2
for all values of d1.
But, let's at least agree on that. Or, in the alternative, agree on something else. Then we know what we're shooting for.
On Sep 9, 2012, at 8:15 AM, Roy Smith \report@bugs.python.org\ wrote:
We need to define the scope of what input strings will be accepted.
Since it is easier to widen the domain of acceptable arguments than to narrow it in the future, I would say let's start by accepting str(x) only where x is date, time, timezone or datetime. I would leave out timedelta for now because it's str(x) does not resemble ISO at all.
Either that or full ISO 8601. Anything in between is just too hard to explain.
I see I mis-stated my example. When I wrote:
s = str(d1)
d2 = datetime.datetime(s)
assert d1 == d2
what I really meant was:
s = d1.isoformat()
d2 = datetime.datetime(s)
assert d1 == d2
But, now I realize that while that is certainly an absolute lower bound, it's almost certainly not sufficient. The most common use case I see on a daily basis is parsing strings that look like "2012-09-07T23:59:59+00:00". This is also John Nagle's original use case from the cited mailing list thread:
I want to parse standard ISO date/time strings such as 2012-09-09T18:00:00-07:00
Datetime.isoformat() returns something that matches the beginning of that, but doesn't have the time zone offset. And it's the offset that makes strptime() not usable as a soluation, because "%z" isn't portable.
If we don't satisfy the "2012-09-07T23:59:59+00:00" case, then we won't have really done anything useful.
For what parts of ISO 8601 to accept, there's a standard: RFC3339, "Date and Time on the Internet: Timestamps". See section 5.6:
date-fullyear = 4DIGIT date-month = 2DIGIT ; 01-12 date-mday = 2DIGIT ; 01-28, 01-29, 01-30, 01-31 based on ; month/year time-hour = 2DIGIT ; 00-23 time-minute = 2DIGIT ; 00-59 time-second = 2DIGIT ; 00-58, 00-59, 00-60 based on leap second ; rules time-secfrac = "." 1*DIGIT time-numoffset = ("+" / "-") time-hour ":" time-minute time-offset = "Z" / time-numoffset
partial-time = time-hour ":" time-minute ":" time-second [time-secfrac] full-date = date-fullyear "-" date-month "-" date-mday full-time = partial-time time-offset
date-time = full-date "T" full-time
NOTE: Per [ABNF] and ISO8601, the "T" and "Z" characters in this syntax may alternatively be lower case "t" or "z" respectively.
ISO 8601 defines date and time separated by "T".
Applications using this syntax may choose, for the sake of
readability, to specify a full-date and full-time separated by
(say) a space character.
That's straightforward, and can be expressed as a regular expression.
I realize that while that is certainly an absolute lower bound, it's almost certainly not sufficient. The most common use case I see on a daily basis is parsing strings that look like "2012-09-07T23:59:59+00:00".
This is exactly what isoformat() of an aware datetime looks like:
>>> datetime.now(timezone.utc).isoformat()
'2012-09-09T16:09:46.165886+00:00'
str() is the same up to T replaced by space:
>>> print(datetime.now(timezone.utc))
2012-09-09 15:19:12.567692+00:00
For what parts of ISO 8601 to accept, there's a standard: RFC3339
This is almost indistinguishable from the idea of accepting .isoformat() and str() results. From what I see the only difference is that 't' is accepted for date/time separator and 'z' is accepted as a timezone.
Let's start with this.
As an ultimate solution, I would like to see something like codec registry so that we can do things like datetime(.., format='rfc3339') or date(.., format='gnu') for GNU parse_datetime. I think this will look more pythonic than strptime(). Of course, strptime format can also be accepted as the value for the format keyword.
I've started collecting some test cases. I'll keep adding to the collection. I'm going to start trolling ISO 8601:2004(E) for more. Let me know if there are other sources I should be considering.
Ooops, clicked the wrong button.
there is a module that parses those strings pretty nicely, it’s called pyiso8601: http://code.google.com/p/pyiso8601/
in the context of writing a better plistlib, i also needed the capability to parse those strings, and decided not to use the sucky incomplete implementation of plistlib, but the one mentioned above.
i py3ified it, eliminating quite some code, and the result is pretty terse, check it out: https://github.com/flying-sheep/plist/blob/master/iso8601.py
note that that implementation returns utc-datetimes for timezoneless strings, instead of naive ones. (l.30)
I've written a parser for ISO 8601: https://github.com/boxed/iso8601
Some basic tests are included and it supports most of the standard. Haven't gotten around to the more obscure parts like durations and intervals, but those are trivial to add...
Are you offering the module for inclusion in the stdlib?
Éric Araujo: absolutely. Although I think my code can be improved (speed wise, elegance, etc) since I just wrote it quickly a weekend :)
John listed four modules with issues in the first message, and now we have proposals for two more modules. Could you work together to make a unified patch?
Alexander, do you think there is a need to check python-ideas or python-dev before working on this?
(I changed the title to clarify scope: ISO 8601 is huge and not easily accessible whereas W3CDTF/RFC 3339 is narrower in scope and freely accessible.)
Éric> do you think there is a need to check python-ideas or python-dev before working on this?
Yes, I think this is python-ideas material. IMHO, what should be added to datetime module in 3.4 is ability to construct date/time objects from their str() representation:
assert time(str(t)) == t assert date(str(d)) == d assert datetime(str(dt)) == dt
I am not sure the same is needed for timedelta, but this can be discussed.
Implementation of any external to python standard should be wetted at PyPI first. There may be a reason why there is no rfc3339.py module on PyPI.
I had the issue today. I needed to parse a date with the following format.
2014-04-04T23:59:00+09:00
and could not with strptime.
I see a discussion in March 2014 http://code.activestate.com/lists/python-ideas/26883/ but no followup.
For references: http://www.w3.org/TR/NOTE-datetime http://tools.ietf.org/html/rfc3339
On closer inspection, Anders Hovmöller proposal doesn't work. https://github.com/boxed/iso8601
At least for the microseconds part.
In http://tools.ietf.org/html/rfc3339#section-5.6, the microsecond part is defined as:
time-secfrac = "." 1*DIGIT
In http://www.w3.org/TR/NOTE-datetime, same thing: s = one or more digits representing a decimal fraction of a second
Anders considers it to be only six digits. It can be more or it can be less. :)
Will comment on github too.
Noticed some people doing the same thing
https://github.com/tonyg/python-rfc3339 http://home.blarg.net/~steveha/pyfeed.html https://wiki.python.org/moin/WorkingWithTime
After inspections, the best library for parsing RFC3339 style date is definitely: https://github.com/tonyg/python-rfc3339/
Main code at https://github.com/tonyg/python-rfc3339/blob/master/rfc3339.py
So, shall we include it ? Otherwise, py8601 (https://bitbucket.org/micktwomey/pyiso8601/) looks pretty popular and well maintained (various committers, started in 2012, last commit in 2016). I think we should hurry, that's a great shame it has been while Python is able to generate a 8601 datetime but not parsing it back.
I'm working on the OpenStack project and iso8601 is heavily used.
Otherwise, py8601 (https://bitbucket.org/micktwomey/pyiso8601/) looks pretty popular and well maintained (various committers, started in 2012, last commit in 2016).
I don't think that we should add the iso8601 module to the stdlib, but merge iso8601 "features" into the datetime module.
The iso8601 module supports Python 2.7 and so has to implement its own timezone classes. The datetime module now has datetime.timezone since Python 3.2 for fixed timezone.
The iso8601 module provides functions. I would prefer datetime.datetime *methods*.
Would you mind to try to implement that? It would be kind to contact iso8601 author before.
The important part is also unit tests.
See also bpo-12006 for ISO 8601: "The datetime.strftime() and date.strftime() methods now support ISO 8601 date directives %G, %u and %V. (Contributed by Ashley Anderson in bpo-12006.)".
bpo-12006 will unfortunately of no use for this one.
Actually, I realized that the best implementation of parsing rfc 3339 is in django dateparse utils. To me it's the finest, the most elegant, and no other one can claim to be more robust since it's probably the #1 iso parsing functions used in python. Have a look at https://docs.djangoproject.com/en/1.9/_modules/django/utils/dateparse/#parse_datetime
Alexander, I won't start before I have your opinion. What do you think ?
Here is the PoC with code taken from django.utils.parse_datetime and adapted for the datetime module (I didn't ask for their agreement yet). Of course tests pass. For me it's the most elegant solution.
(I think date and time also need their "fromisotimestamp" counterpart).
(slightly improved version, better use of timedelta)
Is the django license compatible with the Python license?
I don't know. The taken code is really little, modified, and is nothing much that an implementation you had seen a while ago, and recoded by memory not remembering where you saw it in the first place. Do you think that's really an issue ?
From https://www.djangoproject.com/foundation/cla/faq/
Am I giving away the copyright to my contributions?
No. This is a pure license agreement, not a copyright assignment. You still maintain the full copyright for your contributions. You are only providing a license to the DSF to distribute your code without further restrictions. This is not the case for all CLA's, but it is the case for the one we are using.
About
Actually, I realized that the best implementation of parsing rfc 3339 is in django dateparse utils. To me it's the finest, the most elegant, and no other one can claim to be more robust since it's probably the #1 iso parsing functions used in python. Have a look at https://docs.djangoproject.com/en/1.9/_modules/django/utils/dateparse/#parse_datetime
How does it parse this date:
2016-02-15T11:59:46.16588638674+09:00
discarding the microseconds digits after the 6th.
2016-02-15 13:30 GMT+10:30 karl \report@bugs.python.org\:
karl added the comment:
About
> Actually, I realized that the best implementation of parsing rfc 3339 > is in django dateparse utils. To me it's the finest, the most > elegant, and no other one can claim to be more robust since it's > probably the #1 iso parsing functions used in python. Have a look at > https://docs.djangoproject.com/en/1.9/_modules/django/utils/dateparse/#parse_datetime
How does it parse this date:
2016-02-15T11:59:46.16588638674+09:00
----------
Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue15873\
slightly improved + addresses issues stated here : https://bugs.python.org/review/15873/diff/16581/Lib/datetime.py#newcode1418Lib/datetime.py:1418
How does it parse this date: 2016-02-15T11:59:46.16588638674+09:00
Mathieu Dupuy added the comment:
discarding the microseconds digits after the 6th.
Hum, you should use the same rounding method than datetime.datetime.fromtimestamp(): ROUND_HALF_UP, as round().
In practice, you can for example pass a floating point number as microseconds to datetime.datetime constructor.
Since datetime is implemented in C, I'm not sure that using the re is the best choice. Since the regex looks simple enough, we may parse the string without the re module. Well, maybe only for the C implementation.
What is the behaviour is there are spaces before/after the string? What if there are other characters like letters before/after? You should add an unit test for that. I expect an error when parsing "t=2012-04-23T09:15:00" for example.
Your regex ends with $ but doesn't start with ^. Using re.match(), ^ and $ are probably not needed, but I'm not confident when I use regex :-)
Hum, you should use the same rounding method than datetime.datetime.fromtimestamp(): ROUND_HALF_UP, as round(). In practice, you can for example pass a floating point number as microseconds to datetime.datetime constructor.
Unfortunately, you're mistaking with the timedelta constructor. Datetime's one only take int :( But I figured out an elegant manner to cope with (in my opinion)
Since datetime is implemented in C, I'm not sure that using the re is the best choice. Since the regex looks simple enough, we may parse the string without the re module. Well, maybe only for the C implementation.
No regex available at all in CPython ? Otherwise, yeah, if I have to, I can do it with strptime.
What is the behaviour is there are spaces before/after the string? What if there are other characters like letters before/after? You should add an unit test for that. I expect an error when parsing "t=2012-04-23T09:15:00" for example. Your regex ends with $ but doesn't start with ^. Using re.match(), ^ and $ are probably not needed, but I'm not confident when I use regex :-)
re.match only look at the beginning of the string, so no need for '^'. And therefore, the case you mention is already handled :)
joined to this mail the last revision of the feature, with correct rounding, more test and one useless line removed. Maybe the good one :) ?
No regex available at all in CPython?
It's not really convenient to use the re module in C.
Otherwise, yeah, if I have to, I can do it with strptime.
I suggest to parse directly the string with C code, since the format looks quite simple (numbers and a few separators).
I suggest to parse directly the string with C code, since the format looks quite simple (numbers and a few separators).
But some of them are optional. And I would really like to mimic the same implementation logic in C.
Now I think the python version is fairly ready. What next ?
It looks to me like you copied a lot of code, doc strings, tests, etc from \https://github.com/django/django/commit/9b1cb75#diff-4db1d116f25f482278090b122e3b0028\ and \https://github.com/django/django/commit/2f59e94\. I wouldn’t call it trivial. There is a BSD license for Django. Or do we have to get the relevant authors to do the Python CLA thing?
The current patch seems to allow a timezone without a colon, or even without minutes (+1100 and +11 as well as the RFC’s +11:00). Is this needed? The colon was made optional in Django in \https://code.djangoproject.com/ticket/18728\; the argument given for this just seems to be ISO 8601 alignment, nothing practical. According to \https://code.djangoproject.com/ticket/22814\ Postgre SQL outputs time zones without the minutes field, but I don’t know if Python should go out of its way to support this obscure format.
RFC 3339 does not specify single digits in many places (e.g. 2016-2-1 1:0:0 is not specified). Should we also be stricter, at least for the minutes and seconds fields?
Also, is it necessary to allow the seconds field to be omitted, as in "2016-02-01 01:21"?
It seems that the “datetime” module does not support leap seconds, so if we mention RFC 3339 we should point out this inconsistency.
Victor: From my limited experiments, datetime.fromtimestamp() seems to use the round-to-even rule (not always rounding half up). Can you confirm? Maybe we should use that for consistency if it is practical. Otherwise, truncation towards zero would be the simplest.
As well as adding datetime.fromisoformat(), I think we should add similar methods to the separate date and time classes. One can parse the RFC’s full-date format fairly easily with strptime(), but not so for partial-time because of the fractional seconds.
The real question is : should we accept whatever iso8601 format is common to be found on the internet, or just be able to consume back the string issued by isoformat. From that results the answers to the questions you're asking: don't accept single digits, neither second-less datetime, ... I don't really mind what the answer is. I'm OK for a stricter acceptance. I would like to ask ourselves : does a simpler, stricter implementation fulfill people needs ? If it's OK for you, it's OK for me.
By taking the Django version, I deviated the bit from the author's original need which was just being able to parse back datetime isoformat. The limitations he raises for not using strptime are gone now (strptime understand timezone), but it still can't understand microseconds nor optional parts (T or space for separator, optional microseconds). Even for a much simpler, stricter implementation, I'd like to stick with regex.
I'll do a date & time version, I just wait that we fall agree on the whole datetime thing.
Wether we change to a simpler code or keep it this way, I can rewrite tests & docstring.
The regular expression r"\d" matches any digit in Unicode I think, not just ASCII digits 0-9. Perhaps we should limit it to ASCII digits. Or is it intended to allow non-ASCII digits like in "२०१६-०२-१६ ०१:२१:१४"?
Oh my god you're right. Thanks there is the re.ASCII flag.
2016-02-16 15:07 GMT+10:30 Martin Panter \report@bugs.python.org\:
Martin Panter added the comment:
The regular expression r"\d" matches any digit in Unicode I think, not just ASCII digits 0-9. Perhaps we should limit it to ASCII digits. Or is it intended to allow non-ASCII digits like in "२०१६-०२-१६ ०१:२१:१४"?
----------
Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue15873\
simpler version using a simpler, stricter regex
(OK, I said a stupidity: datetime's strptime handle microseconds. But time's one doesn't)
OK, I know I post a lot, but this one should be the good one:
I'm impatient to receive your feedback.
Crap, I just checked spams today and almost all mails of the reviewboard landed in spams ! So I made a new patch addressing all concerns:
bonus:
I still have a doubt though about the place I moved the regex. Tell me.
Hi Aymeric Augustin. I am guessing you are the original author of the code and tests in Django for parsing datetime strings (https://bugs.python.org/issue15873#msg260342). If so, would you be happy for it to be incorporated into Python?
Mathieu: I left a couple quick review comments. (Normally I leave a message in the main bug thread, but I forgot the other time.)
Doc strings: Generally I think we keep the doc strings to a minimum, and leave the main documentation for the RST files. For the RST documentation, I would suggest including a rough summary of the format. E.g. for time.fromisoformat(), something like “The string should be in the RFC’s ‘full-time’ format, which looks like HH:MM:SS[.mmmmmm][Z|±HH:MM].”
Now that you added the two new regex strings, I can see that it might be useful to keep them together, rather than next to each class. Or you could even make them class attributes. No strong opinions either way; whatever works for you I think.
New patch with all your concerns addressed (martin.panther+ silentghost) EXCEPT the single dispatch dictionary thing.
SilentGhost: the dictionary single dispatch thing attached (apply on top of the last, fromisoformat_new3). I mind the performance penalty for date-only parsing users, but the code is definitively shorter and more elegant.
But we have a major problem: tests fails because what is used in tests is a subclass of datetime classes (Subclass[Date|Time|DateTime]) and thus, the dispatch break with a KeyError: class.SubDate[...]. I have no idea on how mitigate that. Do you ?
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/abalkin' closed_at =
created_at =
labels = ['3.7', 'type-feature', 'library']
title = 'datetime: add ability to parse RFC 3339 dates and times'
updated_at =
user = 'https://bugs.python.org/nagle'
```
bugs.python.org fields:
```python
activity =
actor = 'belopolsky'
assignee = 'belopolsky'
closed = True
closed_date =
closer = 'belopolsky'
components = ['Library (Lib)']
creation =
creator = 'nagle'
dependencies = []
files = ['27141', '27165', '41922', '41923', '41926', '41927', '41934', '41935', '41940', '41945', '41951', '44015', '44016', '44019']
hgrepos = []
issue_num = 15873
keywords = ['patch']
message_count = 90.0
messages = ['169941', '169952', '169966', '169968', '169970', '170098', '170104', '170109', '170112', '170114', '170116', '170180', '170181', '174339', '183672', '183743', '183809', '183921', '183931', '221829', '221830', '221831', '221903', '260099', '260100', '260150', '260266', '260276', '260280', '260282', '260292', '260293', '260294', '260295', '260298', '260303', '260309', '260318', '260337', '260342', '260343', '260344', '260345', '260347', '260350', '260356', '260382', '260420', '260426', '260427', '260440', '260441', '260442', '260445', '260449', '260989', '260990', '260991', '263867', '269714', '269722', '270529', '270828', '270829', '270831', '270899', '272021', '272026', '273609', '291822', '291831', '304950', '307603', '307604', '307605', '307606', '307607', '307610', '307616', '308214', '308505', '308507', '308510', '308569', '308637', '308851', '309168', '309175', '311703', '313105']
nosy_count = 36.0
nosy_names = ['barry', 'jcea', 'cben', 'roysmith', 'ncoghlan', 'belopolsky', 'nagle', 'vstinner', 'jwilk', 'mcepl', 'eric.araujo', 'Arfrever', 'r.david.murray', 'davydov', 'cvrebert', 'karlcow', 'SilentGhost', 'Elvis.Pranskevichus', 'perey', 'flying sheep', 'mihaic', 'aymeric.augustin', 'Roman.Evstifeev', 'berker.peksag', 'martin.panter', 'piotr.dobrogost', 'kirpit', 'Anders.Hovm\xc3\xb6ller', 'jstasiak', 'Eric.Hanchrow', 'deronnax', 'pbryan', 'p-ganssle', 'sirex', 'larsonreever', 'jaitaiwan']
pr_nums = ['4699', '4841', '5559', '5559', '5939']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue15873'
versions = ['Python 3.7']
```