scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.55k stars 465 forks source link

Unable to parse date with UTC offset #375

Open thernstig opened 6 years ago

thernstig commented 6 years ago

Dateparser is unable to parse this string: Fri Jan 26 16:32:21 +0000 2018

>>> import dateparser
>>> foo = dateparser.parse('Fri Jan 26 16:32:21 +0000 2018')
>>> print(foo)
None
alertedsnake commented 6 years ago

It can if you write it Fri Jan 26 16:32:21 2018 +0000 or 'Fri Jan 26 16:32:21 2018 +0000 - seems weird to not have the TZ offset at the end.

thernstig commented 6 years ago

@alertedsnake I agree completely, and more people do. But this is Twitter's way to format it. See created_at at this page: https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object.html

alertedsnake commented 6 years ago

@thernstig wow, how bizarre!

noviluni commented 3 years ago

This should work with date_formats once we merge this: https://github.com/scrapinghub/dateparser/pull/840/files

>>> dateparser.parse('Fri Jan 26 16:32:21 +0000 2018', date_formats=['%a %b %d %H:%M:%S %z %Y'])
datetime.datetime(2018, 1, 26, 16, 32, 21, tzinfo=datetime.timezone.utc)
thernstig commented 3 years ago

@noviluni That is a good approach, that is the approach many time libraries take to let you specify the format. In addition, it seems Twitters API v2 will have a better created_at from the start.