jdillard / sphinx-sitemap

Sphinx extension to generate a multi-lingual, multi-version sitemap for HTML builds
https://sphinx-sitemap.readthedocs.io/en/latest/index.html
MIT License
55 stars 21 forks source link

NEW: Added sitemap_suffix_included to better work with Cloudflare Pages and search engines #89

Open lextm opened 6 months ago

lextm commented 6 months ago

When hosting a Sphinx project on Cloudflare Pages with default suffix .html, a very annoying fact is that the Cloudflare platform generates 301 responses to remove the suffix.

Search engines (especially Google) dislike such redirection and refuse to index such pages, and that makes the generated sitemap less useful for SEO.

Thus, this pull request proposes a new setting sitemap_suffix_included to control whether .html should be written to sitemap.xml. The default value is set to True to keep current behavior. When False is set, the generated sitemap.xml works well with Cloudflare and SEO.

jdillard commented 5 months ago

Thanks for the PR! I think this approach works, the other would be to add the file suffix to the URL scheme, but that would be a breaking change for anyone not using the default schema and don't think that is worth a major bump at this point. (If only I had the hindsight for a better default scheme from the beginning)

I can't cut a release for a couple weeks, but will as soon as I have the time to respond to any surprise issues, should they arise (don't expect any though).

a very annoying fact is that the Cloudflare platform generates 301 responses to remove the suffix.

Annoying indeed, I guess I'm old school but don't understand the disdain for the .html extension.

lextm commented 5 months ago

@jdillard Thanks for the comments. No rush to include this I think and my team can stick to our own fork.

Cloudflare does not only dislike the .html extension, but also remove default from the end of the URLs. It might make some sense from SEO perspective, but just bring difficulty to sphinx site owners.

jdillard commented 5 months ago

@lextm Just curious, would using the dirhtml builder work in your case? It changes the build structure to remove the need for .html and this extension already supports the dirhtml builder. If that is the case I might just need to add documentation about using that builder in this kind of scenario.

lextm commented 5 months ago

@jdillard I will give that a try then.