ocaml / ocaml

The core OCaml system: compilers, runtime system, base libraries
https://ocaml.org
Other
5.39k stars 1.09k forks source link

Update robots.txt on the caml.inria.fr server #6766

Closed vicuna closed 9 years ago

vicuna commented 9 years ago

Original bug ID: 6766 Reporter: braibant Status: closed (set by @damiendoligez on 2015-02-24T21:13:03Z) Resolution: fixed Priority: normal Severity: trivial Category: web site

Bug description

Randomly googling for the OCaml manual sometimes yield results that are not the most recent ones. Following what is written here http://www.robotstxt.org/robotstxt.html it suffices to put the following robots.txt file at the root of the caml.inria.fr server (so that it is served as caml.inria.fr/robots.txt).

The allow directive is described here: http://en.wikipedia.org/wiki/Robots_exclusion_standard#Allow_directive Alternatively, one could only disallow some older versions of the manual to be indexed. (Note that not being indexed does not prevent finding them through the website.)

$ cat robots.txt User-agent: Disallow: /cgi-bin/viewvc.cgi/ User-agent: Disallow: /statistics/ Allow: pub/docs/manual-ocaml/ Allow: pub/docs/manual-caml-light/ Disallow: /pub/docs/

Steps to reproduce

For instance, today, googling for

ocaml interfacing with c

yields

http://caml.inria.fr/pub/docs/manual-ocaml-4.00/manual033.html

rather than

http://caml.inria.fr/pub/docs/manual-ocaml/intfc.html

vicuna commented 9 years ago

Comment author: @damiendoligez

Done. We'll see how long it takes for Google to update.