Closed CharString closed 7 years ago
@CharString as mentioned in the commit, we're talking about 2 different issues here:
/*search
line in robots.txt
as this is causing problems (maybe using a combination of the X-Robots-Tag
HTTP response header and the nofollow
value on listing links)sitemap.xml.gz
is pointing to the view instead of using the canonical URLmore information here: https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag
I can fix those things after the @plone/framework-team give me some clues on how to proceed.
@hvelarde I only mentioned the first in this issue description, and I've created a separate issue in the plone.app.layout for the views thing: https://github.com/plone/plone.app.layout/issues/117
The simplest possible solution would be to replace the /*search
line with 2
lines:
Disallow: /search
Disallow: /*@@search
That would correct the (first) issue at hand with Google (and possibly others that allow *
syntax. The X-Robots-Tag
(or a <meta>
-tag in the <head>
of the search templates) would be a completer solution, that expands the functionality of blocking search result pages to crawlers that implement the standard robots.txt (without *
).
BUG/PROBLEM REPORT (OR OTHER COMMON ISSUE)
What I did:
Installed Plone 5.0.6 and created a folder called
Research
What I expect to happen:
I expected Google would index the contents.
What actually happened:
Google search console threw an error: that the urls in that folder were in the
sitemap.xml.gz
, but was Disallowed byrobots.txt
.What version of Plone/ Addons I am using:
Plone 5.0.6
See https://github.com/plone/Products.CMFPlone/commit/03a7670544add6c889ce72391f71f4775929418e