pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.26k stars 17.8k forks source link

DOC: Disallow crawlers in development docs #26924

Closed datapythonista closed 5 years ago

datapythonista commented 5 years ago

(continiung the discussion in https://github.com/pandas-docs/pandas-docs-travis/pull/3#issuecomment-503148283)

Afaik we're currently not telling search engine which is the copy of the documentation they should be indexing. They seem to be smart enough, and they've been returning the right restults so far, but given that we're now changing the dev documentation to dev.pandas.io, it's probably worth to tell them not to index that copy.

It should be as easy as adding a robots.txt in the root of the dev docs repo with this content:

User-agent: *
Disallow: /

See: https://en.wikipedia.org/wiki/Robots_exclusion_standard

This can be easily done adding another line similar to the one used for the CNAME: https://github.com/pandas-dev/pandas/blob/master/azure-pipelines.yml#L148

@TomAugspurger @jorisvandenbossche does this make sense?

TomAugspurger commented 5 years ago

Yeah I think this make sense.

On Tue, Jun 18, 2019 at 9:22 AM Marc Garcia notifications@github.com wrote:

(continiung the discussion in pandas-docs/pandas-docs-travis#3 (comment) https://github.com/pandas-docs/pandas-docs-travis/pull/3#issuecomment-503148283 )

Afaik we're currently not telling search engine which is the copy of the documentation they should be indexing. They seem to be smart enough, and they've been returning the right restults so far, but given that we're now changing the dev documentation to dev.pandas.io, it's probably worth to tell them not to index that copy.

It should be as easy as adding a robots.txt in the root of the dev docs repo with this content:

User-agent: * Disallow: /

See: https://en.wikipedia.org/wiki/Robots_exclusion_standard

This can be easily done adding another line similar to the one used for the CNAME: https://github.com/pandas-dev/pandas/blob/master/azure-pipelines.yml#L148

@TomAugspurger https://github.com/TomAugspurger @jorisvandenbossche https://github.com/jorisvandenbossche does this make sense?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/26924?email_source=notifications&email_token=AAKAOIT3LISI7V4FFJ5QQS3P3DVQVA5CNFSM4HZATEUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G2FCVOA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOIXPHMRUOAVTCTCCJ5LP3DVQVANCNFSM4HZATEUA .