scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.57k stars 465 forks source link

Fix ago bug #1129

Closed serhii73 closed 1 year ago

serhii73 commented 1 year ago

Close #340

Gallaecio commented 1 year ago

"(\\d+)\s*год\\b": "\\1", as originally suggested by @noviluni, works for the new test cases, but fails for existing ones.

Because I do not speak Russian, I am not sure how to best address those other issues.

For example, according to tests, 1 год 2 месяца means 1 year and 2 months ago. Is that correct? If so, does 1 год mean 1 year ago or year 1? How can you tell?

serhii73 commented 1 year ago

Hi @Gallaecio

1 год 2 месяца means 1 year and 2 months ago. Is that correct?

Yes, you're right.

If so, does 1 год mean 1 year ago or year 1?

It means year 1.

I think I just overcomplicated the tests and most of the web pages have a simpler format.

Gallaecio commented 1 year ago

How can you tell 1 год means year 1 but 1 год 2 месяца means 1 year and 2 months ago? How would you say 1 year ago in line with 1 год 2 месяца?

serhii73 commented 1 year ago

1 year ago - один год назад / 1 год назад 2001 years ago - 2001 год назад year 2001 - 2001 год

In Russian, there would be one year(1 год), two years(2 года), three years(3 года), and four years(4 года), but five years(5 лет), six years(6 лет), seven years(7 лет), eight years(8 лет), and nine years(9 лет).

That is, to count the years, you need two words: год and лет.

In general, 2000 год - it's the year 2000 2000 лет назад - 2000 years ago 2001 год - it's the year 2001 2001 год назад - 2000 years ago

serhii73 commented 1 year ago

1 год 2 месяца means 1 year and 2 months ago. Is that correct?

Yes, you're right.

is because the word the месяц(month) can be used with the words either backward or forward in time.

serhii73 commented 1 year ago

If so, does 1 год mean 1 year ago or year 1?

It means year 1.

it's because for ago need a world назад.

Gallaecio commented 1 year ago

OK, so I have gone with 2 regular expressions:

I have added a few more test cases that I hope are accurate. Please take a good look.

serhii73 commented 1 year ago

LGTM @Gallaecio @wRAR