threatpatrols / hibp-downloader

Efficiently download new pwned password hashes from api.pwnedpasswords.com fast
https://hibp-downloader.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
15 stars 3 forks source link

Weird behaviour (on windows): excessive download time, issues with continuing download #7

Open rwiesbach opened 2 months ago

rwiesbach commented 2 months ago

Hey there, trying out this tool I have massive issues. In about 30 hours the ntlm download was at the 49-25 subfolder, so less than about 5/16 of teh download was done. (the official downloader took little more than 1 hour for a complete download) I had to restart the system and so I restarted the download(er) later. Now I get a stacktrace similar to https://github.com/python/cpython/issues/107078 in QueueItemStatsCompute in line 347 of hibp_download.py

As I thought that would just be about statistics I removed the datetime.fromtimestamp(t).astimezone() in stats.py

hibp-downloader workers seem to be stuck in an infinite loop then. according to process monitor, they keep accessing %python%\Lib\site-packages\certifi\cacert.pem

The system time is ntp-checked and fine.

ndejong commented 2 months ago

Thanks for the report - are you able to provide a way to reproduce the issue so it can be tracked down?

Without any access to a Windows host and have not seen anything like what you're desecribing before.

My guess (and it's a guess) is that reducing the default --processes and --chunk-size values is likely to be a band-aid fix for you until the root cause can be tracked down.