konklone / oversight.garden

Bringing together the oversight community's work.
https://oversight.garden
Creative Commons Zero v1.0 Universal
26 stars 9 forks source link

Sitemaps may be busted #223

Closed konklone closed 5 years ago

konklone commented 5 years ago

The current sitemap_index file only links to 2, 3, and 4. There are also many sitemap files in the shared/sitemaps directory, but only 2, 3, and 4 have been updated recently. This seems odd and probably wrong, and would help explain why Search Console shows relatively few pages indexed compared to the size of our corpus.

divergentdave commented 5 years ago

I took a look, and I think everything is okay. Currently the sitemap index links to three files, and all files involved were written last Sunday. Looking at the older files, those contain roughly the same chunks of reports as the current three files. I think the library was rotating out file names it was using, to avoid weird issues overwriting files that clients might be reading.