scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.55k stars 465 forks source link

String crashes parser #687

Closed holtzhau closed 4 years ago

holtzhau commented 4 years ago

The following string crashes the parser:

import dateparser
result_date = dateparser.parse("9000 taon")

While not something one would try to parse, it crashes search_date functionality as well when encountered:

from dateparser.search import search_dates
results = search_dates("9000 taon")

I get the following trace:

  File "bug_dataparser.py", line 4, in <module>
    result_date = dateparser.parse("9000 taon")
  File "/usr/local/lib/python3.6/dist-packages/dateparser/conf.py", line 84, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/dateparser/__init__.py", line 53, in parse
    data = parser.get_date_data(date_string, date_formats)
  File "/usr/local/lib/python3.6/dist-packages/dateparser/date.py", line 418, in get_date_data
    locale, date_string, date_formats, settings=self._settings)
  File "/usr/local/lib/python3.6/dist-packages/dateparser/date.py", line 196, in parse
    return instance._parse()
  File "/usr/local/lib/python3.6/dist-packages/dateparser/date.py", line 200, in _parse
    date_obj = self._parsers[parser_name]()
  File "/usr/local/lib/python3.6/dist-packages/dateparser/date.py", line 213, in _try_freshness_parser
    return freshness_date_parser.get_date_data(self._get_translated_date(), self._settings)
  File "/usr/local/lib/python3.6/dist-packages/dateparser/freshness_date_parser.py", line 151, in get_date_data
    date, period = self.parse(date_string, settings)
  File "/usr/local/lib/python3.6/dist-packages/dateparser/freshness_date_parser.py", line 96, in parse
    date, period = self._parse_date(date_string, settings.PREFER_DATES_FROM)
  File "/usr/local/lib/python3.6/dist-packages/dateparser/freshness_date_parser.py", line 136, in _parse_date
    date = self.now - td
  File "/usr/local/lib/python3.6/dist-packages/dateutil/relativedelta.py", line 399, in __rsub__
    return self.__neg__().__radd__(other)
  File "/usr/local/lib/python3.6/dist-packages/dateutil/relativedelta.py", line 396, in __radd__
    return self.__add__(other)
  File "/usr/local/lib/python3.6/dist-packages/dateutil/relativedelta.py", line 368, in __add__
    day = min(calendar.monthrange(year, month)[1],
  File "/usr/lib/python3.6/calendar.py", line 124, in monthrange
    day1 = weekday(year, month, 1)
  File "/usr/lib/python3.6/calendar.py", line 116, in weekday
    return datetime.date(year, month, day).weekday()
ValueError: year -6980 is out of range
noviluni commented 4 years ago

Hi @holtzhau, thank you for raising this.

I created a PR that is fixing another related problem and it's going to fix this: https://github.com/scrapinghub/dateparser/pull/686