redimp / otterwiki

A minimalistic wiki powered by python, markdown and git.
https://otterwiki.com
MIT License
652 stars 29 forks source link

Suggestion: Disable Indexing on Non-Wiki Pages #133

Closed bwfiq closed 2 months ago

bwfiq commented 3 months ago

Currently, Google has indexed all the URLs available on my otterwiki instance. Normally this would be fine, but a lot of the search results point to sub-URLs such as the specific revision pages (URL including ?revision=) and the source pages.

I've solved this on my own system by modifying nginx confs to provide noindex headers, but this might be a good option to have for less technically inclined users. A settings dropdown could disable indexing for the non-wiki pages or even the whole wiki, for cases where an instance is only meant for personal documentation.

redimp commented 3 months ago

Hey @bwfiq,

Thank you for bringing this up! An Otter Wiki up to version 2.5.2 sets <meta name="robots" content="noindex, nofollow"/> only for page history, page attachments and page blame. Even for sane defaults this is not enough. Will add this to at least the changelog and pages displayed with a given `revision.

To amke this configureable in a convient way, I follow what you propsed. My first idea to implement this is to add a settings option that controls the generated /robots.txt.

When allowed the robots.txt is generated as

User-agent: *
Allow: /

else

User-agent: *
Disallow: /

For more complex configurations users should provide a custom robots.txt.