sbrl / Pepperminty-Wiki

A wiki in a box
https://peppermint.mooncarrot.space/
Mozilla Public License 2.0
177 stars 20 forks source link

Google Sitemaps #207

Closed SeanFromIT closed 3 years ago

SeanFromIT commented 3 years ago

Google Search Console (formerly Webmaster Tools) allows website owners to suggest indexing to the search engine via the sitemaps formats defined at https://www.sitemaps.org/protocol.html. I believe XML or Text would be the best formats for Pepperminty to possibly implement.

sbrl commented 3 years ago

Thanks for opening this issue. It looks like the sitemap has to be located on disk at that specific URL. While Pepperminty Wiki could, in theory, manage that automatically, I foresee 3 potential issues:

  1. Managing a file on disk is inherently more complicated than just serving it to the user - and it adds additional complexity to Pepperminty Wiki
  2. If one were to upload an xml file called sitemap when Pepperminty Wiki was managing sitemap.xml automatically, things would get..... awkward.
  3. If one were to create a page called sitemap.xml when Pepperminty was managing sitemap.xml automatically, things would also get interesting

To this end, it would be much easier to write / append to a robots.txt file if it doesn't exist already. This is a bit problematic though, since robots.txt must be in the top-level directory (according to this page) - and Pepperminty Wiki may be installed in a subdirectory - and we don't want to write anything outside our directory if we can help it (we don't currently - this makes Pepperminty Wiki both predictable and easy to backup).

Is there any other way to point search engines at a specific URL for the sitemap (e.g. a <meta /> tag, or a header)?

sbrl commented 3 years ago

Initial XML sitemap support is now present! Manual setup is required in order for crawlers to notice it though. You need to add the following line to the robots.txt file at the root of your domain:

Sitemap: https://wiki.example.com/path/to/index.php?action=sitemap

So if I had my wiki located at https://wiki.example.com/subdir/, I would create (or edit) robots.txt at https://wiki.example.com/robots.txt.

This manual setup is required because Pepperminty Wiki does not edit anything outside of it's own directory (it's a rule by which I develop it).

If anyone knows of a <meta /> tag or a HTTP header we could set instead, that would be greatly appreciated.

You can also manually submit the sitemap URL (which can also be found on the credits page) through the web interface of many search engines, which works too.

Finally, if you already have a sitemap, you can use a sitemap index file.

sbrl commented 3 years ago

For future reference, I used these pages to help me when implementing this: