kasramp / InternetWayBackMachine

Submit URLs to archive.org easily
https://madadipouya.com/portfolio/the-internet-wayback-machine/
MIT License
7 stars 0 forks source link

IA now 429s out. Don't know if the IA is planning to do this permanently. #7

Closed GhbSmwc closed 4 years ago

GhbSmwc commented 4 years ago

I'm experiencing fails more frequently, caused by the IA now using 429 Too Many Requests (I found this out by having the command prompt running through a list of URLs, wait till it starts failing frequently, then test the error using a browser).

I was thinking of having a delay during save (like 10 seconds), the syntax goes: java -jar InternetWaybackMachine.jar <URL_QuotesOptional> <Time_In_Seconds>

I would like to have it in this case [1]:

java -jar InternetWaybackMachine.jar "http://www.example.com" 10 >>OutputLog.txt
java -jar InternetWaybackMachine.jar "http://www.example.com/1" 10 >>OutputLog.txt
java -jar InternetWaybackMachine.jar "http://www.example.com/2" 10 >>OutputLog.txt
java -jar InternetWaybackMachine.jar "http://www.example.com/3" 10 >>OutputLog.txt

and it would output in this format:

<outputline>
<outputline>
<outputline>
<outputline>
is ``Your page submitted sucessfully!`` or ``Page submission failed :-( `` Although the command prompt does have ``timeout /t [/nobreak] `` the problem is that it would reformat the list, causing the output text to be misaligned in relation to the URL save list. Not only that, I use browser extensions, and they paste each URL on each line one after another, which is more difficult that I have to add a delay command between each saveURL command. For example: ``` java -jar InternetWaybackMachine.jar google.com >>OutputLog.txt timeout /t 5 java -jar InternetWaybackMachine.jar google.com >>OutputLog.txt timeout /t 5 java -jar InternetWaybackMachine.jar google.com >>OutputLog.txt timeout /t 5 java -jar InternetWaybackMachine.jar google.com >>OutputLog.txt timeout /t 5 ``` Outputs: ``` Your page submitted sucessfully! Your page submitted sucessfully! Your page submitted sucessfully! Your page submitted sucessfully! ``` While adding “>>OutputLog.txt” to all lines: ``` java -jar InternetWaybackMachine.jar google.com >>OutputLog.txt timeout /t 5 >>OutputLog.txt java -jar InternetWaybackMachine.jar google.com >>OutputLog.txt timeout /t 5 >>OutputLog.txt java -jar InternetWaybackMachine.jar google.com >>OutputLog.txt timeout /t 5 >>OutputLog.txt java -jar InternetWaybackMachine.jar google.com >>OutputLog.txt timeout /t 5 >>OutputLog.txt ``` Results this: ``` Your page submitted sucessfully! Waiting for 5 seconds, press a key to continue ...43210 Your page submitted sucessfully! Waiting for 5 seconds, press a key to continue ...43210 Your page submitted sucessfully! Waiting for 5 seconds, press a key to continue ...43210 Your page submitted sucessfully! Waiting for 5 seconds, press a key to continue ...43210 ``` (ignore the weird characters between the countdown numbers displayed) Notice if you print the output of a timeout, it prints: ``` [linebreak] Waiting for 5 seconds, press a key to continue ...43210 ``` This will make it harder to know what output corresponds to what command, as timeouts occupy 2 lines. I'm using Notepad++ (Notepad plus plus), by the way. And currently, you can use the column-select and paste and it will paste each URL lining up with each URL to save, making it immediate to search and extract all the URLs that failed (like temporary 404 on twitter) and try again. --OR-- I believed the tool reads the HTTP response code (looking at the source code, I assume excluded URLs output 403, 404 for file not found or invalid URL, IDK exactly), therefore, if it returns 429, it will pause internally for a certain amount of time (10 seconds, for example), then try again. If 429 again, do it again, until it gets any other error (which outputs ``Page submission failed :-( ``) or succeeds (``Your page submitted sucessfully!``). It would output the same line format mentioned at “[1]”. Thank you for reading, even if the 429 issue is temporary, websites or pages can vanish at any time, and automated saves is the best option to prevent or reduce this.
GhbSmwc commented 4 years ago

SOLVED: I realized you can have multiple commands on the same line using & as a separator. Therefore, you can do this:

java -jar InternetWaybackMachine.jar "http://www.example.com" >>OutputLog.txt & timeout /t 10
java -jar InternetWaybackMachine.jar "http://www.example.com/1" >>OutputLog.txt & timeout /t 10

The OutputLog.txt's line format will match with this, and gives a cooldown after each saves. Should it still 429s out, you can change the timeout (replace 10 with a different number) to a larger number to give the IA a breathing room.