tomnomnom / waybackurls

Fetch all the URLs that the Wayback Machine knows about for a domain
3.41k stars 455 forks source link

[issue] Urls appended together #28

Open pdelteil opened 3 years ago

pdelteil commented 3 years ago

Some urls are appended with what seems to be different domains

Test:

 > echo "https://dominoweb.draco.res.ibm.com"|waybackurls |grep TCG

Screenshot from 2021-06-21 20-14-15

Notice .TCG.htmlhttp:/msdn.microsoft.com/en-us/library/ms171339.aspxhttp:/www.opensymphony

Ugroon commented 2 years ago

Lmao, dude it's part of path :D

201800284 commented 1 year ago

Even I am facing the same issue, is there any way if we can parse the output to remove such cases of appended URLS

pdelteil commented 1 year ago

Even I am facing the same issue, is there any way if we can parse the output to remove such cases of appended URLS

Yes, using http/https as separator in awk.