Closed benbuzbee closed 7 months ago
Tagging folks here who were involved in the original PR https://github.com/scrapinghub/dateparser/pull/1146
@adnan-awan @Gallaecio @serhii73 could I have your comments and reviews please
Could you have a look at the 2 failing tests?
Certainly @Gallaecio - sorry I missed that locally - tox / all tests do not run well on Windows where I developed this so I just ran select tests and expected to check all of them in CI
In the first case (test_dates_parse_utc_offset_does_not_throw), it is parsing "0:4", French, with settings
"PREFER_DATES_FROM": "past",
"PREFER_DAY_OF_MONTH": "first",
"PREFER_LOCALE_DATE_ORDER": True,
"PREFER_MONTH_OF_YEAR": "current",
"RELATIVE_BASE": datetime(
year=1970, month=1, day=1, hour=0, minute=0, second=0
),
it used to expect to get expected_date=datetime(1969, 12, 31, 14, 4)
but after my change it gets datetime(1969, 1, 31, 14, 4)
I would argue that with PREFER_MONTH_OF_YEAR set to "Current", and "Current" being January 1st 1970, that datetime(1969, 1, 31, 14, 4)
is a better result
However with this particular set of configuration, I am not exactly 100% sure what to expect. These settings were generated by a fuzzer so perhaps they don't really make a ton of sense together anyway; rather than change the settings (and thus deviate from what the parser caught) I have opted to update the test expectation to accept January. CC @ennamarie19 who made that change in case I missed something.
The next one is even more perplexing, it is searching a German string for dates and asserting that when it finds the word "Die" in the string, it should be parsed as datetime.datetime(1999, 12, 28, 0, 0)
Similarly, my change makes this datetime.datetime(1999, 1, 28, 0, 0)
instead. I don't speak German, but as far as I can tell "Die" just means "The" so I have no idea why it is even matching it. In my opinion, this could be a bug with the search identifying a non-date word, and so I can't really guess as to what a sensible result would be. For the sake of simplicity, I also just updated this test to accept January. CC @eszakharova who added this in https://github.com/scrapinghub/dateparser/commit/7e7e22417e71e103225a2fdf05e68a6e67db041c
Fix two problems
Parser would use current month even if prefer_month_of_year was not current when relative_base was not none
Parser would use current month to derive 'what is the last day of this month' - for example, with prefer_month=last and prefer_day=last, but current_month=april, it would return december 30th, because it would use april to find that the last day was the 30th, when it should have used December (#1217)
Additionally, add a test to test_date_parser that uses prefer_month
Fixes #1217