scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.55k stars 465 forks source link

Weird behavior when adding a trailing space #691

Closed noviluni closed 3 years ago

noviluni commented 4 years ago

When using this expression I don't get any result:

>>> dateparser.parse('午前10時')

However, when adding a trailing space I get a value:

>>> dateparser.parse('午前10時 ')
datetime.datetime(2020, 5, 19, 0, 17, 52, 777714)
noviluni commented 4 years ago

I think this is because of the Regex. However, I'm not sure if the first example should return a value or if the second shouldn't return it. Better Japanese skills would help here.

BTW, I'm surprised that the "sanitize" function doesn't delete the trailing spaces.

Gallaecio commented 4 years ago

Google says it’s “10 am”. Which, if accurate, would mean that none of the parsings are correct. And the second parsing is worse, as it is returning an incorrect date.

noviluni commented 3 years ago

Since we merged this: https://github.com/scrapinghub/dateparser/pull/841 the original issue won't happen again and we can close this ticket.

We could open a new ticket regarding the language issue, but I don't have enough Japanese knowledge to ever know if it's correctly formed or not, so I will just close it. Feel free to open a new language-issue.