scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.56k stars 465 forks source link

Dateparsed hangs minutes on certain dates #1185

Open manycoding opened 1 year ago

manycoding commented 1 year ago

I don't have much details atm but I noticed that in certain cases the performance is awful. I wouldn't say it's a specific version, it has been always like that though rare. I took an eye on it.

E.g.

188.09s call     manhattan_hub/helpers/tests/test_time_helper.py::test_extract_datetime_from[: 20/05/20-2020-05-20-mmdd]

@pytest.mark.parametrize("ocr_date, parsed, date_format", ocr_to_iso)
def test_extract_datetime_from(ocr_date, parsed, date_format):
    with freeze_time("2021-12-02"):
        assert (
            str(
                time_helper.extract_datetime_from(
                    ocr_date,
                    date_format=date_format,
                    ignore_dates_older_than_years=10,
                    max_future_date=time_helper.x_days_in_future(12 * 31),
                )
            ).split(" ")[0]
            == parsed
        )

The tests are launched in parallel, the code deployed in Django environment with multiple wsgi workers.

Python 3.7.13 dateparser 1.1.8