openrightsgroup / blocked-org-uk

Template front-end code, markup, style-sheets, images and other assets for the Censorship Monitoring Project (blocked.org.uk)
https://www.blocked.org.uk/
GNU General Public License v3.0
13 stars 5 forks source link

Prominently display a status message when a sites' robots.txt prevents probing #279

Closed dantheta closed 6 years ago

JimKillock commented 6 years ago

Thought on this: from our perspective, a "no index" result is the same as a "not blocked" result. Couldn't we display these sites as both "no index" and "not blocked"? (And simply not index the content as requested?)

dantheta commented 6 years ago

Unfortunately not so straightforward. We check the robots.txt from the server once before sending the request to the probes. The probes themselves take the server's word for it and run the test.

JimKillock commented 6 years ago

OK. I suppose in any case a robots.txt might be replicated at a blocked URL for some reason and fool us as we are not comparing page content.