irsdl / top10webseclist

Top Ten Web Hacking Techniques List
724 stars 106 forks source link

Change original URLs for archived ones #1

Open n1c4n0n opened 4 years ago

n1c4n0n commented 4 years ago

What do you think of archiving the original URLs and replacing them for their archived ones? I think it'd make this repo more future-proof.

irsdl commented 4 years ago

Absolutely I agree. The problem is that automating it might be tricky as some the links are completely dead, some have been redirected, some shows 404, some shows irrelevant data, and some are still alive! Unless we take a copy of them all automatically from the wayback machine it can be really hard (perhaps we can save both copy of wbm and the page itself if it shows 200 status). We should be able to use a certain algorithm to choose an appropriate snapshot (for example for 2010 we need the first snapshot between 2010 and 2015 perhaps) - not sure how wayback machine works with the apis and whether there is a rate limit etc etc.

Can you contribute to this perhaps? We can even publish the tool in this repository as well so we can use it in the future too!

irsdl commented 4 years ago

Another solution would be by doing this manually but that can take serious time... I may do it as a hobby but I will probably need help as categorising them can be a chore too (saving them all in PDF perhaps if not already in PDF?).

n1c4n0n commented 4 years ago

@irsdl I started doing something half manually and half automatically. Here's the first test I made to see how that would work: https://github.com/n1c4n0n/top10webseclist/blob/master/2019.md

I'll start a PR on here asap so we can gradually tweak things as necessary, what do you think?

I'll also change a few things in the tool I've used and upload it so we can work on that too.

n1c4n0n commented 4 years ago

@irsdl We can define which way is best for archiving purposes, but I think it'll be ok if we just archive them as original format (be it HTML, PDF, etc.), tell me what you think

irsdl commented 4 years ago

For 2019 it is easy to do this because they are still live and we should be able to just save the endpoints if they are not in slideshare or something like that. I guess the ultimate approach would be to manually hunt them down one by one and save them in an appropriate format. It is a chore but can become very valuable - I may start doing this in my spare time ;)