Closed yoquankara closed 5 years ago
Thanks for PR. Does this very short sleep really help?
Yes, it did help. Before this PR, my download was very intermittent with frequent HTTP error 503 (Service Temporarily Unavailable) and retries.
However, I think it depends on specific network environment. So this is hardly the only best value. I can change to some larger value, like 5ms, if you prefer. But the larger it is the longer the download.
Btw, I didn't touch download_list.py. If we agree on the value, I will fix that file too.
OK! It has small sideeffect. So, I'll merge it as a trial. Thank you!
In fact, I already made sleep in download_list.py
in the loop.
https://github.com/soskek/bookcorpus/blob/973edec568f14e5eba2ea57a595d703708696ad9/download_list.py#L60
I missed to make it in download_files.py
only, though I even set SLEEP_SEC
...
https://github.com/soskek/bookcorpus/blob/973edec568f14e5eba2ea57a595d703708696ad9/download_files.py#L28
Ah, I see :-) Thank you for merging! I agree to treat it as a trial for later refactoring.
This helps to reduce HTTP Error 503 which is likely caused by service limitation at server side.