scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.57k stars 465 forks source link

Weekdays are handled inconsistently #1196

Open susca opened 1 year ago

susca commented 1 year ago

I checked other issues, but this exact problem was not mentioned.

For context: I am in time zone CST, so in UTC it is already 2023-11-07, which may be the reason for the issue (just a wild guess).

Problem: Today's and tomorrow's weekday name are returning today's and tomorrow's dates. Any other weekday returns a date in the past:

>>> parse("now")
datetime.datetime(2023, 11, 6, 20, 15, 31, 568142)
>>> parse("Monday")
datetime.datetime(2023, 11, 6, 0, 0)
>>> parse("Tuesday")
datetime.datetime(2023, 11, 7, 0, 0)
>>> parse("Wednesday")
datetime.datetime(2023, 11, 1, 0, 0)
>>> parse("Thursday")
datetime.datetime(2023, 11, 2, 0, 0)
>>> parse("Friday")
datetime.datetime(2023, 11, 3, 0, 0)
>>> parse("Saturday")
datetime.datetime(2023, 11, 4, 0, 0)
>>> parse("Sunday")
datetime.datetime(2023, 11, 5, 0, 0)

With settings={"PREFER_DATES_FROM": "future"}, Tuesday (i.e., tomorrow) is not recognised as in the future:

>> parse("Monday", settings={"PREFER_DATES_FROM": "future"})
datetime.datetime(2023, 11, 13, 0, 0)
>>> parse("Tuesday", settings={"PREFER_DATES_FROM": "future"})
datetime.datetime(2023, 11, 14, 0, 0)
>>> parse("Wednesday", settings={"PREFER_DATES_FROM": "future"})
datetime.datetime(2023, 11, 8, 0, 0)
>>> parse("Thursday", settings={"PREFER_DATES_FROM": "future"})
datetime.datetime(2023, 11, 9, 0, 0)
>>> parse("Friday", settings={"PREFER_DATES_FROM": "future"})
datetime.datetime(2023, 11, 10, 0, 0)
>>> parse("Saturday", settings={"PREFER_DATES_FROM": "future"})
datetime.datetime(2023, 11, 11, 0, 0)
>>> parse("Sunday", settings={"PREFER_DATES_FROM": "future"})
datetime.datetime(2023, 11, 12, 0, 0)
>>> parse("Sunday", settings={"PREFER_DATES_FROM": "future"})
datetime.datetime(2023, 11, 12, 0, 0)

With settings={"PREFER_DATES_FROM": "past"}, today (Monday) is not recognised as "not past", I am not sure if this is the wanted behaviour, I just mention it for completeness:

>>> parse("Monday", settings={"PREFER_DATES_FROM": "past"})
datetime.datetime(2023, 11, 6, 0, 0)
>>> parse("Tuesday", settings={"PREFER_DATES_FROM": "past"})
datetime.datetime(2023, 10, 31, 0, 0)
>>> parse("Wednesday", settings={"PREFER_DATES_FROM": "past"})
datetime.datetime(2023, 11, 1, 0, 0)
>>> parse("Thursday", settings={"PREFER_DATES_FROM": "past"})
datetime.datetime(2023, 11, 2, 0, 0)
>>> parse("Friday", settings={"PREFER_DATES_FROM": "past"})
datetime.datetime(2023, 11, 3, 0, 0)
>>> parse("Saturday", settings={"PREFER_DATES_FROM": "past"})
datetime.datetime(2023, 11, 4, 0, 0)
>>> parse("Sunday", settings={"PREFER_DATES_FROM": "past"})
datetime.datetime(2023, 11, 5, 0, 0)