Closed lopuhin closed 4 years ago
Hi @lopuhin, I'm trying to reproduce this with the last version (dateparser==0.7.4
) and I can't (furthermore, it takes less time to finish than older versions). It seems that it was fixed when we added some performance improvements in the last version.
Could you confirm that it's also fixed for you?
Thanks in advance :slightly_smiling_face:
Thanks @noviluni I can confirm that growth is linear now and constant is much smaller:
In [1]: from dateparser.search import search_dates
In [2]: %time _ = search_dates(' '.join(str(2000 + i) for i in range(20)))
CPU times: user 55.7 ms, sys: 3.99 ms, total: 59.7 ms
Wall time: 59.4 ms
In [3]: %time _ = search_dates(' '.join(str(2000 + i) for i in range(20)))
CPU times: user 31.1 ms, sys: 3.94 ms, total: 35 ms
Wall time: 34.6 ms
In [4]: %time _ = search_dates(' '.join(str(2000 + i) for i in range(40)))
CPU times: user 67.1 ms, sys: 122 µs, total: 67.2 ms
Wall time: 66.8 ms
In [5]: %time _ = search_dates(' '.join(str(2000 + i) for i in range(80)))
CPU times: user 135 ms, sys: 18 µs, total: 135 ms
Wall time: 135 ms
In [6]: %time _ = search_dates(' '.join(str(2000 + i) for i in range(160)))
CPU times: user 246 ms, sys: 16.1 ms, total: 262 ms
Wall time: 262 ms
Thank you! Closing
Using current master (289fcc4ca936069b2c75bb5378b4136633b3edf9) we see quadratic complexity with respect to the number of date-like numbers in the input:
Timings with
languages=['en'])
are the same.https://github.com/scrapinghub/dateparser/issues/439 might be related