scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.55k stars 465 forks source link

Bug with language FR #962

Closed Julienh closed 3 years ago

Julienh commented 3 years ago

Hello,

Since the version 0.7.5 I have a problem with this example:

text = '18 sept. 2018'
dateparser.parse(text, languages=['fr'])

I get the date 2018-07-18, the correct date is 2018-09-18

It's work fine with the version 0.7.4

Thank you !

Edit: I forgot to mention, the version of Python is 3.9.6

noviluni commented 3 years ago

Yes... this is something we already know... This is because sept could mean "September" or "seven" in French. In this case, this is wrongly recognized as 18 7 2018 and that's the reason why you get that result.

There's an open issue here: https://github.com/scrapinghub/dateparser/issues/676 to discuss how to handle this, and the same issue was reported here: https://github.com/scrapinghub/dateparser/issues/819.

I don't know any useful workaround apart from doing something like: text.replace('sept.', 'sep.') or text.replace('sept.', 'septembre'), which is not desirable.

I will close this as duplicate of this: https://github.com/scrapinghub/dateparser/issues/819. Thank you for reporting.

Julienh commented 3 years ago

Hi noviluni,

Thanks for the answer. It is not exactly the same bug, "sept." with the dot is "septembre", not the number sept (seven).