nrjones8 / robots-dot-txt-archive-bot

A project to collect, archive, and publish robots.txt files from across the internet - with a focus on government websites
https://robots-dot-txt-db.com/
6 stars 0 forks source link

display / link to internet archive for URL prefix that has robots.txt entry #5

Closed nrjones8 closed 4 years ago

nrjones8 commented 4 years ago

e.g. if there's a Disallow: /foia/quarterly/* entry, then link to that prefix in the wayback machine.

e.g. https://web.archive.org/web/*/https://turbotax.intuit.com/lp/* see https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server

nrjones8 commented 4 years ago

this sort of works now already