Original bug ID: 6766
Reporter: braibant
Status: closed (set by @damiendoligez on 2015-02-24T21:13:03Z)
Resolution: fixed
Priority: normal
Severity: trivial
Category: web site
Bug description
Randomly googling for the OCaml manual sometimes yield results that are not the most recent ones. Following what is written here http://www.robotstxt.org/robotstxt.html
it suffices to put the following robots.txt file at the root of the
caml.inria.fr server (so that it is served as caml.inria.fr/robots.txt).
Original bug ID: 6766 Reporter: braibant Status: closed (set by @damiendoligez on 2015-02-24T21:13:03Z) Resolution: fixed Priority: normal Severity: trivial Category: web site
Bug description
Randomly googling for the OCaml manual sometimes yield results that are not the most recent ones. Following what is written here http://www.robotstxt.org/robotstxt.html it suffices to put the following robots.txt file at the root of the caml.inria.fr server (so that it is served as caml.inria.fr/robots.txt).
The allow directive is described here: http://en.wikipedia.org/wiki/Robots_exclusion_standard#Allow_directive Alternatively, one could only disallow some older versions of the manual to be indexed. (Note that not being indexed does not prevent finding them through the website.)
$ cat robots.txt User-agent: Disallow: /cgi-bin/viewvc.cgi/ User-agent: Disallow: /statistics/ Allow: pub/docs/manual-ocaml/ Allow: pub/docs/manual-caml-light/ Disallow: /pub/docs/
Steps to reproduce
For instance, today, googling for
ocaml interfacing with c
yields
http://caml.inria.fr/pub/docs/manual-ocaml-4.00/manual033.html
rather than
http://caml.inria.fr/pub/docs/manual-ocaml/intfc.html