Closed bwfiq closed 2 months ago
Hey @bwfiq,
Thank you for bringing this up! An Otter Wiki up to version 2.5.2 sets <meta name="robots" content="noindex, nofollow"/>
only for page history, page attachments and page blame. Even for sane defaults this is not enough. Will add this to at least the changelog and pages displayed with a given `revision.
To amke this configureable in a convient way, I follow what you propsed. My first idea to implement this is to add a settings option that controls the generated /robots.txt
.
When allowed the robots.txt is generated as
User-agent: *
Allow: /
else
User-agent: *
Disallow: /
For more complex configurations users should provide a custom robots.txt
.
Currently, Google has indexed all the URLs available on my otterwiki instance. Normally this would be fine, but a lot of the search results point to sub-URLs such as the specific revision pages (URL including ?revision=) and the source pages.
I've solved this on my own system by modifying nginx confs to provide noindex headers, but this might be a good option to have for less technically inclined users. A settings dropdown could disable indexing for the non-wiki pages or even the whole wiki, for cases where an instance is only meant for personal documentation.