JSON versions of URLs which contain query strings generate recursive URLs

mysociety / alaveteli

Provide a Freedom of Information request system for your jurisdiction

https://alaveteli.org

Other

389 stars 196 forks source link

JSON versions of URLs which contain query strings generate recursive URLs #311

Closed sebbacon closed 4 years ago

sebbacon commented 12 years ago

If you visit a request page containing a query string, like http://www.whatdotheyknow.com/request/information_on_deprivation_of_li_44?unfold=1, the JSON version that is linked as an alternate form is incorrectlly given as http://www.whatdotheyknow.com/request/information_on_deprivation_of_li_44?unfold=1.json. As this is a valid URL in Alaveteli, it results in recursive URLs, i.e. on the latter page, there is a link to http://www.whatdotheyknow.com/request/information_on_deprivation_of_li_44?unfold=1.json.json. This causes some bots to spider the same page endlessly.

kingqueen3065 commented 7 years ago

robots.txt still includes:

# The following adding Jan 2012 to stop robots crawling pages
# generated in error (see
# https://github.com/mysociety/alaveteli/issues/311).  Can be removed
# later in 2012 when the error pages have been dropped from the index
Disallow: *.json.j*

unnecessary now - remove it?

kingqueen3065 commented 7 years ago

yeah that worked.

garethrees commented 7 years ago

Fixed that for you :)

You need three backticks for a code block, rather than one (which does inline code)

garethrees commented 5 years ago

http://www.whatdotheyknow.com/request/information_on_deprivation_of_li_44?unfold=1 now shows:

<link rel="alternate" type="application/json" title="JSON version of this page" href="/request/information_on_deprivation_of_li_44.json?unfold=1">

…so I think this is fixed.