Open Buratinator opened 1 year ago
I'm new to open-source but I've noticed after some debugging that the error is being caused due to the following lines specifically this portion:
if days[day_index] == day:
if self.settings.PREFER_DATES_FROM == "past":
steps = 7 # Too large if dateobj.month & dateobj.year are at their minimum value of 1
else:
steps = 0
else:
while days[day_index] != day:
day_index -= 1
steps += 1
delta = timedelta(days=-steps)
dateobj = dateobj + delta # This is the offending line
After the if statement steps = 7 (which only runs if PREFER_DATES_FROM, hence it's behavior), if dateobj.day < 7 and dateobj.month and datetobj.year are their minimum value of 1, when the line dateobj = dateobj + delta is run, dateobj, will have a year attribute below it's allowed minimum. According to the datetime documentation this results in the Overflow Error and explains the message "date value out of range" as dateobj wants it's year attribute to be between dateobj,min and dateobj.max.
Notably, all these conditions (days attribute value less than 7, months & years attribute set to 1 and days[day_index] == day) seems to only occur for the very last instance of "среду" (Wednesday) as stepping through the debugger, it seems all these conditions dont line up for the other parsed items. I'm not familiar enough with the codebase to tell why this is but I found that the steps = 7 line came in #559
dateparser version: 1.1.8 Python version: 3.12.0
When searching for dates in a large chunk of Russian text (see example below) with the
'PREFER_DATES_FROM': 'past'
setting, dateparser throws theOverflowError: date value out of range
error.Additional observations:
среду
- meaning Wednesday), the code works fine. Removing other chunks from the string also prevents the error from being thrown.'PREFER_DATES_FROM'
setting to 'future' or removing it altogether also prevents the error from happening.Code to reproduce (the text makes no sense as I minimized the size of string as much as I could to still be able to reproduce the error):