Closed sebbacon closed 4 years ago
robots.txt still includes:
# The following adding Jan 2012 to stop robots crawling pages
# generated in error (see
# https://github.com/mysociety/alaveteli/issues/311). Can be removed
# later in 2012 when the error pages have been dropped from the index
Disallow: *.json.j*
unnecessary now - remove it?
yeah that worked.
Fixed that for you :)
You need three backticks for a code block, rather than one (which does inline code)
http://www.whatdotheyknow.com/request/information_on_deprivation_of_li_44?unfold=1 now shows:
<link rel="alternate" type="application/json" title="JSON version of this page" href="/request/information_on_deprivation_of_li_44.json?unfold=1">
…so I think this is fixed.
If you visit a request page containing a query string, like http://www.whatdotheyknow.com/request/information_on_deprivation_of_li_44?unfold=1, the JSON version that is linked as an alternate form is incorrectlly given as http://www.whatdotheyknow.com/request/information_on_deprivation_of_li_44?unfold=1.json. As this is a valid URL in Alaveteli, it results in recursive URLs, i.e. on the latter page, there is a link to http://www.whatdotheyknow.com/request/information_on_deprivation_of_li_44?unfold=1.json.json. This causes some bots to spider the same page endlessly.