nrjones8 / robots-dot-txt-archive-bot

A project to collect, archive, and publish robots.txt files from across the internet - with a focus on government websites
6 stars 0 forks source link

add concept of "tags", remove concept of "source" #8

Open nrjones8 opened 4 years ago

nrjones8 commented 4 years ago

some hostnames are going to appear multiple times (e.g. ones linked to from that also show up in the list of "all" .gov domains), ideally those would show up as having come from both "dotgov_domains" and "fed_gov_from_usa_dot_gov"