Open noviluni opened 4 years ago
There is a new case: in French "sept" means "September" but also "seven" and it gets confused.
Original issue: https://github.com/scrapinghub/dateparser/issues/819
One raw approach I can think of is to translate all possible ways. Create as many copies of the translated strings as combinations of possible interpretations are possible, and yield a result for the first match possible.
However, instead of passing the first translation through all parsers and then trying the second translation, it may make sense to pass all translations through the first parser, then all translations through the second parser, etc.
Hopefully setting a language and a date order will allow to obtain the expected result in most cases.
There are languages where some words have different meanings. This is generating some issues like this: https://github.com/scrapinghub/dateparser/issues/337
I'm creating this issue to track this and try to find a solution.
As far as I have seen, the translation is performed before the date is parsed, so we can't select the valid meaning using the other part as context. Apart from that, those words are inserted into a Python dictionary containing "word: meaning", overriding the other words with a different meaning.
Using regex simplifications could fix some use cases, but is not a valid approach for most of the cases.
In some cases (like in the word "mar" for Italian), we could detect when there are double elements (for example two months) and find if there is any duplicated key for any of them and try to find the real meaning, but this would be probably hard to implement.
Please, if you have any idea don't hesitate to comment on this issue.
List of cases:
And there are a lot of other cases where the same word is used with different meanings: