python / cpython

The Python programming language
https://www.python.org
Other
62.51k stars 30.01k forks source link

datetime.fromisoformat(): Omitted colon in timezone suffix raises ValueError #86537

Open abb777a6-d978-446c-b962-eba54ff1b655 opened 3 years ago

abb777a6-d978-446c-b962-eba54ff1b655 commented 3 years ago
BPO 42371
Nosy @abalkin, @pganssle

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-feature', 'library', '3.10'] title = 'datetime.fromisoformat(): Omitted colon in timezone suffix raises ValueError' updated_at = user = 'https://bugs.python.org/BengtLers' ``` bugs.python.org fields: ```python activity = actor = 'p-ganssle' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'Bengt.L\xc3\xbcers' dependencies = [] files = [] hgrepos = [] issue_num = 42371 keywords = [] message_count = 2.0 messages = ['381102', '381127'] nosy_count = 3.0 nosy_names = ['belopolsky', 'Bengt.L\xc3\xbcers', 'p-ganssle'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue42371' versions = ['Python 3.10'] ```

abb777a6-d978-446c-b962-eba54ff1b655 commented 3 years ago

I am trying to parse ISO8601-formatted datetime strings with timezones.

This works fine when there is a colon separating the hour and minute digits:

>> import datetime >> datetime.datetime.fromisoformat('2020-11-16T11:00:00+00:00') >> datetime.datetime(2020, 11, 16, 11, 0, tzinfo=datetime.timezone.utc)

However this fails when there is no colon between the hour and the minute digits:

>>> import datetime
>>> datetime.datetime.fromisoformat('2020-11-16T11:00:00+0000')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Invalid isoformat string: '2020-11-16T11:00:00+0000'

This behavior is unexpected, as the ISO8601 standard allows omitting the colon in the string and defining the timezone as "\<time>±hhmm ":

https://en.wikipedia.org/wiki/ISO_8601#Time_offsets_from_UTC

As a workaround, I normalized the timezone suffixes before parsing:

>> if iso8601_string.endswith('+0000'): >> return iso8601_string[:-len('+0000')] + '+00:00' >> if iso8601_string.endswith('+00'): >> return iso8601_string[:-len('+00')] + '+00:00' >> if iso8601_string.endswith('-0000'): >> return iso8601_string[:-len('-0000')] + '+00:00' >> if iso8601_string.endswith('-00'): >> return iso8601_string[:-len('-00')] + '+00:00'

This only works for the UTC timezone. I would be nice to have a more general solution which can handle any timezone.

I tested this with CPython 3.8. .fromisoformat() was added in 3.7, so earlier versions should not be affected by this:

https://docs.python.org/3/library/datetime.html#datetime.date.fromisoformat

pganssle commented 3 years ago

This is the expected behavior of .fromisoformat(). A similar issue is https://bugs.python.org/issue35829, which asks for the "Z" suffix to be supported.

There is a note about this in the documentation: https://docs.python.org/3/library/datetime.html#datetime.datetime.fromisoformat

"Caution This does not support parsing arbitrary ISO 8601 strings - it is only intended as the inverse operation of datetime.isoformat(). A more full-featured ISO 8601 parser, dateutil.parser.isoparse is available in the third-party package dateutil."

At some point we will work out the kinks in offering as full an ISO 8601 datetime parser as possible, but the ISO 8601 datetime spec is very complicated and includes many optional features. We deliberately chose to keep the scope of .fromisoformat() minimal at first, whereas dateutil.parser.isoparse attempts to be a full-featured ISO8601 parser.

Changing the version affected to 3.10, since this is a feature request.

Paebbels commented 7 months ago

Omitting the colon is not allowed by the standard! The wikipedia article is not very precise as it shows individual aspects of a date and time, but no compiled rule. Please also not that many ISO 8601 regular expressions are not strict enough to fully match the EBNF rules of the standard.

This wrong behavior was now introduced in Python 3.11, 3.12 and 3.13 and allows invalid datetime strings to be parsed as ISO 8601.
See #115783.

When reading the EBNF rules of ISO 8601, either all elements of a datetime are in basic format (in short: no separators) or all elements are in extended format (in short: with separators). A mix of basic format and extended format is not allowed.