Closed nvanva closed 1 year ago
I very recently (5e32799e4033bc08726a62935eabfe399fc023b5) added a little tweak that makes the script return a non-zero exit code if it finishes but did not complete all downloads. I now run it like this:
until ~/ia-download.py --cache ~/cache.db -j 256 < ~/wide00012.txt | tee ~/ia-download.log.$(date +'%Y%m%d%H%M%S'); do
echo "Not finished yet"
sleep 1800
done
This seems simpler to me, since you don't have to maintain/read/write a queue file. The state is just the downloaded files on disk. The query cache will make sure you're not hitting IA's API repeatedly. Download attempts you can still easily get from the log files. It assumes that checking whether a file exists on disk is cheap, but I do hope it is.
Would this work for you?
This solution assumes that item-to-urls mapping does not change during downloading which may take several weeks. Are we ok with that?
I don't think they ever change.
This issue is "solved" with 4eb06d14159fa88ee6c2b8f68287d67a8561c5be.
Currently ia-download.py goes through all input items only once. Download errors are reported, but you need restarting ia-download.py manually to make new attempt to download those files that were not downloaded during the previous attempt. Also after 100 consecutive errors the script stops, I have seen this 2 times already: one time for unclear reason (maybe, temporarily network problems), and another due to running out of free inodes(no space left on device error). It is reasonable to signal about consecutive errors, but probably continue trying to download will be better to avoid wasting time before we restart the script manually? May I suggest the following modification. We create a queue and populate it with all the files (or items if we want to request urls lazily for some reason). Then download as before, but if a file fails to download we put it to the end of the queue with an integer indicating the number of attempts. This way we will try to download it again and again, but only after attempting to (re-)download other files. We stop only when the queue is empty, i.e. all files were successfully downloaded. Also we print the number of attempts to the log file, the user can see the number of attempts and interrupt the script manually if he wants to stop trying. @jelmervdl What do you think about this modification? It seems to me it will make ia-download.py more autonomous and easier to use for the future downloads. If you agree, I can try to implement this.