kasramp / InternetWayBackMachine

Submit URLs to archive.org easily
https://madadipouya.com/portfolio/the-internet-wayback-machine/
MIT License
7 stars 0 forks source link

Even more character issues #9

Closed GhbSmwc closed 4 years ago

GhbSmwc commented 4 years ago

Still using the “fixed” version you gave me.

Oddly, I've tested saving these URLs and somehow the command prompt is treating them differently despite both being on the same code page:


chcp 65001
del OutputLog.txt
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/077/904/original/鳥と鳥籠キャラシ.png?1450609113" >>OutputLog.txt & timeout /t 3                                
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/078/212/original/新ガラテアCS.png?1450710929" >>OutputLog.txt & timeout /t 3                                   
pause

The first URL saved successfully, but the second does not work AT ALL[1] , messing around with NP++'s encoding doesn't work either. And yes, I tried manuel saving and it worked. I also tried with and without the chcp 65001, still fails.

I've noticed that chcp 65001 makes a huge difference on how the command prompt handles the characters. I did “pseudo-auto-saving” by making a batch file using the start command, making it open a default browser (if multiple URLs, opens in a new tab) that would save the page as opposed to merely using a script that merely read the HTTP status:

start https://web.archive.org/save/https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/078/212/medium/新ガラテアCS.png?1450710929 & timeout /t 5
start https://web.archive.org/save/https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/078/212/original/新ガラテアCS.png?1450710929 & timeout /t 5
pause

This alone (without the chcp) takes me to a different URL (invalid by the way):

https://web.archive.org/web/20191113023522/https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/078/212/medium/%E8%AD%81%EF%BD%B0%E7%B9%A7%EF%BD%AB%E7%B9%A7%E5%90%B6%CE%9B%E7%B9%9D%E3%83%BB%E3%81%84CS.png?1450710929
https://web.archive.org/web/20191113023527/https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/078/212/original/%E8%AD%81%EF%BD%B0%E7%B9%A7%EF%BD%AB%E7%B9%A7%E5%90%B6%CE%9B%E7%B9%9D%E3%83%BB%E3%81%84CS.png?1450710929

Raw form when testing with the chcp command:

https://web.archive.org/web/20191113023522/https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/078/212/medium/譁ー繧ォ繧吶Λ繝・いCS.png?1450710929
https://web.archive.org/web/20191113023527/https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/078/212/original/譁ー繧ォ繧吶Λ繝・いCS.png?1450710929

whereas I have this in the batch file instead:

chcp 65001
start https://web.archive.org/save/https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/078/212/medium/新ガラテアCS.png?1450710929 & timeout /t 5
start https://web.archive.org/save/https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/078/212/original/新ガラテアCS.png?1450710929 & timeout /t 5
pause

Always works, even URLs that failed (see at [1]).

kasramp commented 4 years ago

Hi @GhbSmwc - well there's not much that I can help. For one, I don't have a windows machine to test special characters and also not enough free time to focus on this project. I'm not actively maintaining this project anymore. My knowledge may not be up to date but to what I know Archive.org does not have a proper REST APIs and submitting URLs in this way is also very error prone. But if you want to change the code, feel free to do so and if you think it's useful to share, you are always welcome to submit a merge request.