Closed jamesmacwhite closed 11 months ago
Hi @jamesmacwhite. I've read up briefly about this, probably by following a Google Search road you've long since walked so please bear with me.
It appears that a sitemap index file shouldn't list sitemap index files, only sitemaps. The URL you provided also suggests that multiple sitemap indexes should each be submitted to the Search Console.
Nested sitemap indexes would also fail Google's validation, apparently. Is yours passing there? As a comment pointed out there, I realise it does complicate things a bit that Google's own sitemap index illegally (?) contains at least one other sitemap index.
Overall, is this the scenario you were asking to be supported?
Hi.
It is a while since I posted, but what I was referring to was the fact the root sitemap.xml didn't contain URLs but links to other sitemap.xml files pointing to URLs. The live example of the site in question probably explains the setup in the clearest way:
https://www.nottinghamcollege.ac.uk/sitemaps-1-sitemap.xml
The main sitemap.xml (redirectst to the above) doesn't list URLs directly, it links to other sitemap.xml files per section which then in turn provide the URLs under the sections linked.
Hope that makes sense.
Hi @jamesmacwhite, ah I see now. Thanks for clarifying, and I realise it's been a while - sorry for this delay in following up.
I ran pa11y-ci
just now against both sitemap.xml
and the more direct sitemaps-1-sitemap.xml
. It appeared to unroll the sitemap index to 1842 URLs
in both cases, although I didn't complete the whole run:
$ pa11y-ci --sitemap https://www.nottinghamcollege.ac.uk/sitemap.xml
Running Pa11y on 1842 URLs:
> https://www.nottinghamcollege.ac.uk/apply - 2 errors
> https://www.nottinghamcollege.ac.uk/employers - 2 errors
> https://www.nottinghamcollege.ac.uk/employers/apprenticeships - 2 errors
...
We do also have a test for the scenario: https://github.com/pa11y/pa11y-ci/blob/e7b7c17b4ec5fa5d3b52b539f15b520af470c0b2/test/integration/cli-sitemap.test.js#L91
Could you have been using a version of pa11y-ci
older than 2.4?
Thanks for this. I could have been, but glad it works!
A website might output multiple sitemap index files as the sitemap.xml before providing URLs for each section. This is done with larger sites.
The structure for my example is roughly this:
https://developers.google.com/search/docs/crawling-indexing/sitemaps/large-sitemaps
Each section contains the URL data. It would appear pa11y-ci cannot parse the sitemap.xml as it is expecting URLs immediately. If I provide one of the section.xml paths, it works.
It would be good if pa11y-ci can parse a sitemap.xml that provides index files and go through each one.