triplea-game / triplea-game.github.io

TripleA Website
https://triplea-game.org/
GNU Affero General Public License v3.0
12 stars 16 forks source link

Web site lacks robots.txt #342

Closed tvleavitt closed 2 months ago

tvleavitt commented 4 years ago

The lack of a robots.txt file means that our sitemap.txt (or, perhaps, .xml) file is not being read by search engines that support this extension to the standard:

https://www.woorank.com/en/blog/how-to-locate-a-sitemap-in-a-robots-txt-file

I suggest we create a simple robots.txt file along these lines:

Sitemap: http://triplea-game.org/sitemap.[txt|xml, depending on what we do] User-agent: * Allow: /

This presumes that there's nothing we want to exclude.

DanVanAtta commented 2 months ago

The jekyll sitemap plugin is now configured, and generates the robots.txt file now: https://triplea-game.org/robots.txt

The contents is a very simple:

Sitemap: https://triplea-game.org/sitemap.xml

This resource: https://medium.com/@vilcins/optimize-your-jekyll-powered-website-with-these-simple-steps-b2a24d66a629, indicates to do something like:

---
layout: none
---
User-agent: *
Sitemap: {{ site.url }}/sitemap.xml

I have no context to know which of any of the three variants is better, or even different. Without knowing any better, I'm inclined to stick to the default generated by the jeklly-sitemap plugin.

@tvleavitt do you know if there is a specific desired format for robots.txt by crawlers?

I'll mark this issue as closed for now, since we now have a robot.txt file :smile: