Closed nwidger closed 1 year ago
@raviqqe Sorry for not catching this earlier! We tested the changes you made for #316 running muffet
against a preview version of our site served with hugo server
, which is how we check for broken links in our CI pipeline. hugo server
returns a Content-Type: application/xml; charset=utf-8
header when serving a sitemap.xml
(I believe this is due to this check here in hugo
's source code), thus in CI the new v2.9.1 release of muffet
works great. However, the production site is served using Caddy, which as mentioned above returns a Content-Type: text/xml; charset=utf-8
header when serving a sitemap.xml
file. That means that if we try to run muffet
v2.9.1 against our production site, muffet
returns with an root page has invalid content type
error.
Merging #325 (82e52af) into main (4536090) will not change coverage. The diff coverage is
100.00%
.
@@ Coverage Diff @@
## main #325 +/- ##
=======================================
Coverage 87.34% 87.34%
=======================================
Files 30 30
Lines 861 861
=======================================
Hits 752 752
Misses 88 88
Partials 21 21
Impacted Files | Coverage Δ | |
---|---|---|
sitemap_page_parser.go | 100.00% <100.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Thanks for your contribution!
@raviqqe Thank you for the quick merge & release! :+1:
Update
(*sitemapPageParser).Parse
to support bothapplication/xml
andtext/xml
as valid MIME types for sitemaps, previously onlyapplication/xml
was considered valid. However, popular HTTP servers such as Caddy (https://caddyserver.com) return aContent-Type: text/xml; charset=utf-8
response header when serving asitemap.xml
sitemap file. Currently, it is not possible to use/sitemap.xml
as the root page withmuffet
when serving a site with an HTTP server such as Caddy due tomuffet
rejecting theContent-Type
header and thus refusing to parse the site's sitemap file.The official website for the sitemap protocol (https://www.sitemaps.org) does not explictly state any specific
Content-Type
that should be used when serving sitemap files. However, the IANA's Media Types registry (https://www.iana.org/assignments/media-types/media-types.xhtml) lists bothapplication/xml
andtext/xml
as being valid media types forxml
. Additionally, the abstract for RFC 7303 ("XML Media Types", https://www.rfc-editor.org/rfc/rfc7303.html), the reference listed in the IANA registry for bothapplication/xml
andtext/xml
media types, explicitly states thattext/xml
is an alias forapplication/xml
:RFC 7303 Section 9.2 ("text/xml Registration") also states:
Hopefuly, this is enough to warrant allowing
text/xml
in addition toapplication/xml
. A newTestSitemapPageParserParsePageMimeTypeAlias
test has been added to ensure that(*sitemapPageParser).Parse
doesn't return an error when itstyp
argument istext/xml
.