plone / volto

React-based frontend for the Plone Content Management System
https://demo.plone.org/
MIT License
453 stars 612 forks source link

sitemap.xml.gz needs to be split for Google #4638

Open reebalazs opened 1 year ago

reebalazs commented 1 year ago

The sitemap.xml.gz file can only contain 50000 items and be a maximum of 50k. For sites with a lot of content, the current xml is rejected by Google.

Sitemap size limits: All formats limit a single sitemap to 50MB (uncompressed) or 50,000 URLs. If you have a larger file or more URLs, you must break your sitemap into multiple sitemaps. You can optionally create a [sitemap index](https://developers.google.com/search/docs/crawling-indexing/sitemaps/large-sitemaps) file and submit that single index file to Google.

Reference: https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap#:~:text=Sitemap%20size%20limits%3A%20All%20formats,your%20sitemap%20into%20multiple%20sitemaps.

reebalazs commented 1 year ago

I'm working on this, PR in progress until we test it on another site.

ale-rt commented 1 year ago

Maybe this is interesting for you: https://github.com/collective/collective.splitsitemap