scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.55k stars 465 forks source link

One case where dataparser fails to parse correctly when there's extra text #518

Open starrify opened 5 years ago

starrify commented 5 years ago
>>> dateparser.parse(u'Actualisé le 17 avril 2019', languages=['fr'])
>>> dateparser.parse(u'le 17 avril 2019', languages=['fr'])
datetime.datetime(2019, 4, 17, 0, 0)
>>> dateparser.parse(u'17 avril 2019', languages=['fr'])
datetime.datetime(2019, 4, 17, 0, 0)
>>> dateparser.__version__
'0.7.1'

The above example happens with Python 2.7, 3.6, and 3.7. It's somehow unexpected as the extra text "Actualisé" ("updated" in FR) is assumed to be unharmful to the parsing.

starrify commented 5 years ago

Yet another example:

>>> dateparser.parse(u'Publié le 16 avril 2019', languages=['fr'])
>>> dateparser.parse(u'le 16 avril 2019', languages=['fr'])
datetime.datetime(2019, 4, 16, 0, 0)
>>> dateparser.parse(u'16 avril 2019', languages=['fr'])
datetime.datetime(2019, 4, 16, 0, 0)
noviluni commented 5 years ago

related to: https://github.com/scrapinghub/dateparser/issues/521