konklone / oversight.garden

Bringing together the oversight community's work.
https://oversight.garden
Creative Commons Zero v1.0 Universal
26 stars 9 forks source link

Sitemap not being read after all? #221

Open konklone opened 5 years ago

konklone commented 5 years ago

Checking in Google Search Console, it looks like nothing's getting read from sitemaps, and it's all from crawling. I think it's resulting in a smaller index of ~10K pages, rather than the ~114K we should have.

konklone commented 5 years ago

Hmm, well I submitted our sitemap again. It looks like there are ~54K URLs that are "Excluded: crawled but not indexed", which is a low-information status code that just says Google crawled the page but chose not to index it. That still doesn't account for all ~114K reports that the site lists us as having.