scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.52k stars 463 forks source link

Parsing ISO 8601 datetimes without hyphens, colons #867

Open dmalan opened 3 years ago

dmalan commented 3 years ago

We noticed that ISO 8601 datetimes without hyphens or colons don't seem to be parsable, even though I don't think they're required? Cf. https://en.wikipedia.org/wiki/ISO_8601#Dates, https://en.wikipedia.org/wiki/ISO_8601#Times.

E.g.:

>>> import datetime
>>> str(dateparser.parse("2021-01-01T00:00:00Z"))
'2021-01-01 00:00:00+00:00'
>>> str(dateparser.parse("20210101T000000Z"))
'None'

We currently work around such using isodate, with

import dateparser
import isodate

try:
    dt = isodate.parse_datetime(s)
except (isodate.isoerror.ISO8601Error, ValueError):
    dt = dateparser.parse(s)

but weren't sure whether that was intended?

That question aside, many thanks for the wonderfully handy library!

noviluni commented 3 years ago

Hi @dmalan, thank you for opening this issue. This is not definitely intended.

There's a known bug when trying to parse iso dates in other languages (https://github.com/scrapinghub/dateparser/issues/360, https://github.com/scrapinghub/dateparser/issues/765), and an open PR to fix it: https://github.com/scrapinghub/dateparser/pull/790/. Even if it's not the same issue, we could (maybe) include the solution for this in that patch.

I will take a closer look when having the time.

cc: @Gallaecio

dmalan commented 3 years ago

Thanks so much, @noviluni!