Open caiot5 opened 10 months ago
please give a look for this thread https://github.com/hartator/wayback-machine-downloader/issues/273#issuecomment-1886612201
please give a look for this thread #273 (comment)
Thanks for that. I'm using this workaround right now and it worked great! I think it needs to go mainstream 'cause (for now) wayback-machine-downloader is useless without this 'mod'.
It would be really nice if in the workaround we could ignore the 'sleep 3' if the file already exists.
I used to use wayback-machine-downloader quite a lot, however, it doesn't seem to work anymore (at least in a proper way). The reason I think that is behind it not being able to properly download the content anymore is a connection throttling mechanism that archive.org seem to have implanted, as you can see in the log below (which you can establish from the 'connection refused' error) :
http://www.ig.com.br:80/home/editorial/stories/editorial_body/0,1205,254060,00.html # Failed to open TCP connection to web.archive.org:443 (Connection refused - connect(2) for "web.archive.org" port 443) websites/www.ig.com.br/home/editorial/stories/editorial_body/0,1205,254060,00.html was empty and was removed.
For me it looks like one needs to slow down the individual TCP connection establishment in order not to suffer from the throttling mechanism. Is there anything we can do to delay those connections?