scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.53k stars 463 forks source link

French lundi prochain à midi not parsed #459

Open ElisaHW opened 5 years ago

ElisaHW commented 5 years ago

Tried to parse "lundi prochain à midi" and returned None. dateparser.parse("lundi prochain à midi", languages=['fr'])

anarcat commented 5 years ago

i can confirm this implementation is incomplete...

In [1]: import dateparser

In [2]: dateparser.parse('lundi prochain à midi')

In [3]: dateparser.parse('lundi prochain ')

In [4]: dateparser.parse('lundi')
Out[4]: datetime.datetime(2019, 1, 14, 0, 0)

In [5]: dateparser.parse('mardi')
Out[5]: datetime.datetime(2019, 1, 8, 0, 0)

In [6]: dateparser.parse('mardi prochain')

In [7]: dateparser.parse('mardi midi')

In [8]: dateparser.parse('mardi à midi')

In [9]: dateparser.parse('mercredi')
Out[9]: datetime.datetime(2019, 1, 9, 0, 0)

In [10]: dateparser.parse('dimanche')
Out[10]: datetime.datetime(2019, 1, 13, 0, 0)

"prochain" ("next") and "midi" ("noon") throw the parser off. but even setting only the week day confuses the parser: it returns the previous dates instead of the next one...

noviluni commented 4 years ago

This is partially blocked by this: https://github.com/scrapinghub/dateparser/issues/573 because currently, we don't support "next Monday" even in English.

The "midi" part can be implemented by adding something like (?:12\s+)?midi: '12:00' in the fr.yaml file, but maybe it should check that midi is not preceded by 'après' or 'avant'... :thinking: