readthedocs / readthedocs.org

The source code that powers readthedocs.org
https://readthedocs.org/
MIT License
7.99k stars 3.58k forks source link

Make repositories inaccessible to search engines? #7462

Closed jd41 closed 4 years ago

jd41 commented 4 years ago

The use case I am interested in (which I guess is not unprecedented) is that I want to test new stuff in a fork of an OSS project's documentation, but do not want that fork's documentation to be found online (lest people searching for documentation are confused when they hit upon my unmaintained fork in 3 years, rather than the original - another such fork already exists and is in the search index). So I am fine with my fork being public on GitHub, but not fine with the readthedocs documentation pages being in the Google/Bing/... search index. I read that there used to be an option to set repositories to "private", so they wouldn't be found with Google anymore. I guess this option would have been exactly what I wanted, but it seems to be gone. Is there an alternative today?

jd41 commented 4 years ago

(ironically enough, I was confused for a bit searching and not finding this "private" setting in repositories - after I read about it on a PDF created on RTD of an old fork of the Readthedocs documentation, which was not updated but is still in the Google search index)

humitos commented 4 years ago

Read the Docs community (readthedocs.org) does not support PRIVATE versions, Read the Docs for Business (readthedocs.com) does.

However, if you just want to avoid search engines to index your documentation, you can use a robots.txt file. Take a look at https://docs.readthedocs.io/en/latest/hosting.html#custom-robots-txt-pages

I think that should be enough for your use case, right?

jd41 commented 4 years ago

Thanks! Looking there, I see that checking "hidden" for all versions should be enough for me as well. Am I correct?

Best, jd41

On Mon, 7 Sep 2020, 11:45 Manuel Kaufmann, notifications@github.com wrote:

Read the Docs community (readthedocs.org) does not support PRIVATE versions, Read the Docs for Business (readthedocs.com) does.

However, if you just want to avoid search engines to index your documentation, you can use a robots.txt file. Take a look at https://docs.readthedocs.io/en/latest/hosting.html#custom-robots-txt-pages

I think that should be enough for your use case, right?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/readthedocs/readthedocs.org/issues/7462#issuecomment-688207333, or unsubscribe https://github.com/notifications/unsubscribe-auth/APU5ONDETJG24YL5NMH5C6TSESTTFANCNFSM4Q4AIZ6Q .



Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Volker Rieke Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt



humitos commented 4 years ago

Yes. You can mark them all as hidden and double check that the robots.txt file for your project is being generated properly by RTD.

jd41 commented 4 years ago

The robots.txt looks okay to me now and forbids what I want to forbid ( https://nest-simulator-jd41.readthedocs.io/robots.txt), but it does point to a https://nest-simulator-jd41.readthedocs.io/sitemap.xml file, which contains the links I want to disallow. I guess this is okay?

On Tue, 8 Sep 2020, 11:42 Manuel Kaufmann, notifications@github.com wrote:

Yes. You can mark them all as hidden and double check that the robots.txt file for your project is being generated properly by RTD.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/readthedocs/readthedocs.org/issues/7462#issuecomment-688751673, or unsubscribe https://github.com/notifications/unsubscribe-auth/APU5ONCIQCBWMCHUQPYMTJ3SEX4BHANCNFSM4Q4AIZ6Q .



Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Volker Rieke Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt



humitos commented 4 years ago

I guess so. I'm not sure how that affect crawlers. If that's a problem, we should open another issue and track it there. Another user already reported this at https://github.com/readthedocs/readthedocs.org/issues/5391#issuecomment-680108312. Please, subscribe there to keep updated. I'm going to close this one since the original question was answered. Thanks!