issues
search
nrjones8
/
robots-dot-txt-archive-bot
A project to collect, archive, and publish robots.txt files from across the internet - with a focus on government websites
https://robots-dot-txt-db.com/
6
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
ping datasette on heroku when page loads
#15
nrjones8
opened
4 years ago
0
search results should show relevant user-agent
#14
nrjones8
opened
4 years ago
0
increase max_returned_rows in datasette as a stopgap for pagination
#13
nrjones8
closed
4 years ago
1
streamline the data update process
#12
nrjones8
opened
4 years ago
0
reorganize data dirs
#11
nrjones8
opened
4 years ago
0
investigate HTML responses, maybe add retries
#10
nrjones8
opened
4 years ago
1
add covid websites from covidtracking.com
#9
nrjones8
closed
4 years ago
1
add concept of "tags", remove concept of "source"
#8
nrjones8
opened
4 years ago
0
make it easier to add new sources
#7
nrjones8
opened
4 years ago
0
add a README, document data sources better
#6
nrjones8
opened
4 years ago
0
display / link to internet archive for URL prefix that has robots.txt entry
#5
nrjones8
closed
4 years ago
1
pull in `title` or something similar from each hostname
#4
nrjones8
opened
4 years ago
0
publish data to datasette
#3
nrjones8
opened
4 years ago
1
Nick/use better domain sources
#2
nrjones8
closed
4 years ago
0
local manual run, include cleaned versions now
#1
nrjones8
closed
4 years ago
0