Closed hiasl closed 3 years ago
Can you give me an actual example to look at?
What SEOmatic does—assuming you have Site Groups define logically separate sites on in SEOmatic -> Plugin Settings -> Advanced (which is is by default) is described here:
https://nystudio107.com/docs/seomatic/Technologies.html#multi-site-language-locale-support
You'll get one sitemap for each site in the site group, but it will have
<xhtml:link rel="alternate" hreflang="xx-xx">
...links to the same pages in other languages, assuming your entries are localized.
1.) We have the feature "Site Groups define logically separate sites" turned on 2.) Having different Sitemaps is totally fine 3.) Having different/multiple robots.txt in the same domain is wrong in my eyes, and this is the issue here. 4.) The main domain is https://www.concertvienna.com/, the first robots.txt is https://www.concertvienna.com/robots.txt, but there are more at https://www.concertvienna.com/de/robots.txt, https://www.concertvienna.com/fr/robots.txt, ...
The second domain I wrote about is not live yet, I just mentioned it for completeness.
Can you articulate to me why you believe this is wrong? Here's the spec:
https://developers.google.com/search/docs/advanced/robots/robots_txt
The robots.txt for each domain notes what paths are disallowed from searching through relative to the domain. Additionally, the sitemaps that each links to are actually localized to point to the direct URLs relative to that domain as well as other translations of them.
For example, go here:
https://www.concertvienna.com/de/sitemaps-1-section-pagesCv-4-sitemap.xml
and choose "view source" to see more than the human-readable version of the sitemap, and you'll see all of the localizations listed:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="sitemap.xsl"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns:video="http://www.google.com/schemas/sitemap-video/1.1" xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://www.concertvienna.com/de</loc>
<lastmod>2021-03-05T14:08:57+01:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
<xhtml:link rel="alternate" hreflang="x-default" href="https://www.concertvienna.com"/>
<xhtml:link rel="alternate" hreflang="en" href="https://www.concertvienna.com"/>
<xhtml:link rel="alternate" hreflang="fr" href="https://www.concertvienna.com/fr"/>
<xhtml:link rel="alternate" hreflang="de" href="https://www.concertvienna.com/de"/>
<xhtml:link rel="alternate" hreflang="ru" href="https://www.concertvienna.com/ru"/>
<image:image>
<image:loc>http://www.concertvienna.com/user/images/iStock-175563116.jpg</image:loc>
<image:title>Vienna Opera</image:title>
</image:image>
</url>
<url>
<loc>https://www.concertvienna.com/de/oper-wien</loc>
<lastmod>2021-03-12T13:02:37+01:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
<xhtml:link rel="alternate" hreflang="x-default" href="https://www.concertvienna.com/opera-vienna"/>
<xhtml:link rel="alternate" hreflang="en" href="https://www.concertvienna.com/opera-vienna"/>
<xhtml:link rel="alternate" hreflang="fr" href="https://www.concertvienna.com/fr/opera-vienne"/>
<xhtml:link rel="alternate" hreflang="de" href="https://www.concertvienna.com/de/oper-wien"/>
<xhtml:link rel="alternate" hreflang="ru" href="https://www.concertvienna.com/ru/opera-v-vene"/>
</url>
<url>
<loc>https://www.concertvienna.com/de/wiener-staatsoper</loc>
<lastmod>2021-03-12T11:38:14+01:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
<xhtml:link rel="alternate" hreflang="x-default" href="https://www.concertvienna.com/vienna-state-opera"/>
<xhtml:link rel="alternate" hreflang="en" href="https://www.concertvienna.com/vienna-state-opera"/>
<xhtml:link rel="alternate" hreflang="fr" href="https://www.concertvienna.com/fr/opera-de-vienne"/>
<xhtml:link rel="alternate" hreflang="de" href="https://www.concertvienna.com/de/wiener-staatsoper"/>
<xhtml:link rel="alternate" hreflang="ru" href="https://www.concertvienna.com/ru/venskaya-opera"/>
<image:image>
<image:loc>http://www.concertvienna.com/user/images/iStock-501071488.jpg</image:loc>
<image:title>I Stock 501071488</image:title>
</image:image>
</url>
<url>
<loc>https://www.concertvienna.com/de/volksoper-wien</loc>
<lastmod>2021-03-12T11:38:47+01:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
<xhtml:link rel="alternate" hreflang="x-default" href="https://www.concertvienna.com/vienna-volksoper"/>
<xhtml:link rel="alternate" hreflang="en" href="https://www.concertvienna.com/vienna-volksoper"/>
<xhtml:link rel="alternate" hreflang="fr" href="https://www.concertvienna.com/fr/opera-populaire-vienne"/>
<xhtml:link rel="alternate" hreflang="de" href="https://www.concertvienna.com/de/volksoper-wien"/>
<xhtml:link rel="alternate" hreflang="ru" href="https://www.concertvienna.com/ru/venskaya-narodnaya-opera"/>
<image:image>
<image:loc>http://www.concertvienna.com/user/images/Presse_VolksoperDSC_6988.jpg</image:loc>
<image:title>Presse Volksoper DSC 6988</image:title>
</image:image>
</url>
<url>
<loc>https://www.concertvienna.com/de/ballett-wien</loc>
<lastmod>2021-02-10T13:55:51+01:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
<xhtml:link rel="alternate" hreflang="x-default" href="https://www.concertvienna.com/ballet-vienna"/>
<xhtml:link rel="alternate" hreflang="en" href="https://www.concertvienna.com/ballet-vienna"/>
<xhtml:link rel="alternate" hreflang="fr" href="https://www.concertvienna.com/fr/ballet-vienne"/>
<xhtml:link rel="alternate" hreflang="de" href="https://www.concertvienna.com/de/ballett-wien"/>
<xhtml:link rel="alternate" hreflang="ru" href="https://www.concertvienna.com/ru/balet-v-vene"/>
</url>
<url>
<loc>https://www.concertvienna.com/de/spanische-hofreitschule</loc>
<lastmod>2021-03-12T11:40:33+01:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
<xhtml:link rel="alternate" hreflang="x-default" href="https://www.concertvienna.com/spanish-riding-school"/>
<xhtml:link rel="alternate" hreflang="en" href="https://www.concertvienna.com/spanish-riding-school"/>
<xhtml:link rel="alternate" hreflang="fr" href="https://www.concertvienna.com/fr/ecole-espagnole-equitation"/>
<xhtml:link rel="alternate" hreflang="de" href="https://www.concertvienna.com/de/spanische-hofreitschule"/>
<xhtml:link rel="alternate" hreflang="ru" href="https://www.concertvienna.com/ru/ispanskaya-shkola-verkhovoy-yezdy"/>
<image:image>
<image:loc>http://www.concertvienna.com/user/images/csm_morning_exercise_c_Spanish_Riding_School_Julie_Brass_-_Kopie_eb1d6a3c99.jpg</image:loc>
<image:title>Csm morning exercise c Spanish Riding School Julie Brass Kopie eb1d6a3c99</image:title>
</image:image>
</url>
<url>
<loc>https://www.concertvienna.com/de/konzerte-wien</loc>
<lastmod>2021-03-12T13:34:33+01:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
<xhtml:link rel="alternate" hreflang="x-default" href="https://www.concertvienna.com/concerts-vienna"/>
<xhtml:link rel="alternate" hreflang="en" href="https://www.concertvienna.com/concerts-vienna"/>
<xhtml:link rel="alternate" hreflang="fr" href="https://www.concertvienna.com/fr/concerts-vienne"/>
<xhtml:link rel="alternate" hreflang="de" href="https://www.concertvienna.com/de/konzerte-wien"/>
<xhtml:link rel="alternate" hreflang="ru" href="https://www.concertvienna.com/ru/koncerty-v-vene"/>
<image:image>
<image:loc>http:
//www.concertvienna.com/user/images/1432217879Orchestra_CU2_1600x1100_200404_223601.jpg</image:loc>
<image:title>1432217879 Orchestra CU2 1600x1100</image:title>
</image:image>
</url>
<url>
<loc>https://www.concertvienna.com/de/datenschutzerklaerung</loc>
<lastmod>2020-12-09T10:40:46+01:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
<xhtml:link rel="alternate" hreflang="x-default" href="https://www.concertvienna.com/privacy"/>
<xhtml:link rel="alternate" hreflang="en" href="https://www.concertvienna.com/privacy"/>
<xhtml:link rel="alternate" hreflang="fr" href="https://www.concertvienna.com/fr/privacy"/>
<xhtml:link rel="alternate" hreflang="de" href="https://www.concertvienna.com/de/datenschutzerklaerung"/>
<xhtml:link rel="alternate" hreflang="ru" href="https://www.concertvienna.com/ru/privacy"/>
</url>
<url>
<loc>https://www.concertvienna.com/de/imprint</loc>
<lastmod>2020-12-09T10:40:31+01:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
<xhtml:link rel="alternate" hreflang="x-default" href="https://www.concertvienna.com/legal"/>
<xhtml:link rel="alternate" hreflang="en" href="https://www.concertvienna.com/legal"/>
<xhtml:link rel="alternate" hreflang="fr" href="https://www.concertvienna.com/fr/legal"/>
<xhtml:link rel="alternate" hreflang="de" href="https://www.concertvienna.com/de/imprint"/>
<xhtml:link rel="alternate" hreflang="ru" href="https://www.concertvienna.com/ru/legal"/>
</url>
<url>
<loc>https://www.concertvienna.com/de/kontakt</loc>
<lastmod>2020-12-09T10:40:58+01:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
<xhtml:link rel="alternate" hreflang="x-default" href="https://www.concertvienna.com/contact"/>
<xhtml:link rel="alternate" hreflang="en" href="https://www.concertvienna.com/contact"/>
<xhtml:link rel="alternate" hreflang="fr" href="https://www.concertvienna.com/fr/contact"/>
<xhtml:link rel="alternate" hreflang="de" href="https://www.concertvienna.com/de/kontakt"/>
<xhtml:link rel="alternate" hreflang="ru" href="https://www.concertvienna.com/ru/contacts"/>
</url>
<url>
<loc>https://www.concertvienna.com/de/agbs</loc>
<lastmod>2021-02-22T13:18:07+01:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
<xhtml:link rel="alternate" hreflang="x-default" href="https://www.concertvienna.com/terms"/>
<xhtml:link rel="alternate" hreflang="en" href="https://www.concertvienna.com/terms"/>
<xhtml:link rel="alternate" hreflang="fr" href="https://www.concertvienna.com/fr/terms"/>
<xhtml:link rel="alternate" hreflang="de" href="https://www.concertvienna.com/de/agbs"/>
<xhtml:link rel="alternate" hreflang="ru" href="https://www.concertvienna.com/ru/terms"/>
</url>
<url>
<loc>https://www.concertvienna.com/de/dinner-konzert-wien</loc>
<lastmod>2021-03-12T11:42:44+01:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
<xhtml:link rel="alternate" hreflang="x-default" href="https://www.concertvienna.com/dinner-concert-vienna"/>
<xhtml:link rel="alternate" hreflang="en" href="https://www.concertvienna.com/dinner-concert-vienna"/>
<xhtml:link rel="alternate" hreflang="fr" href="https://www.concertvienna.com/fr/diner-concert-vienne"/>
<xhtml:link rel="alternate" hreflang="de" href="https://www.concertvienna.com/de/dinner-konzert-wien"/>
<xhtml:link rel="alternate" hreflang="ru" href="https://www.concertvienna.com/ru/uzhin-koncert"/>
<image:image>
<image:loc>http://www.concertvienna.com/user/images/concert-dinner-kursalon-vienna.jpg</image:loc>
<image:title>Concert dinner kursalon vienna</image:title>
</image:image>
</url>
</urlset>
The problem has nothing to do with sitemaps. All sitemaps and alternate links are totally ok.
The problem is, that there are multiple robots.txt.
One domain should not have more than one robots.txt. Also the Google spec you cited just confirmed that:
This is from https://developers.google.com/search/docs/advanced/robots/robots_txt:
http://example.com/folder/robots.txt | Not a valid robots.txt file. Crawlers don't check for robots.txt files in subdirectories.
Since the different languages within the site group are all served by the same domain "www.concertvienna.com", there should only be one "www.concertvienna.com/robots.txt" and not one for each language www.concertvienna.com/de/robots.txt, www.concertvienna.com/fr/robots.txt, ...
And to make it more specific again: this ONE robots.txt should then link all primary language sitemaps, not only the one in the language you are currently in.
I'm not seeing any actual negative impact from this, though. What it's saying is just that bots will not look for /robots.txt
anywhere but in the root domain. So it'll use https://www.concertvienna.com/robots.txt in your example, and just never find the others.
And since the sitemap it links to properly handles multiple languages, we should be good there too.
However, it's true that this is vestigial for sites that are localized via sub-directories and not domains. Given that you can set the site to any domain you want, it sounds like what we're looking for here is:
robots.txt
for it.robots.txt
list the sitemaps for each site that has a Site URL that has a path in itWould that do it for you?
Yes, I think the second point is the important one, the first one makes the solution cleaner though. The important thing is to additionally link to the primary sitemap of each language/multisite containing a path, in the same domain.
Okay. Also keep in mind that SEOmatic automatically sends the sitemaps to Google and Bing, as you should see in your web console, and each sitemap has hreflang
links to translations in other languages to index.
So I think we're actually talking about minimal to no impact here, in terms of bots not discovering these sitemaps appropriately.
I agree, impact might be minimal, but the current implementation is not perfect. I was confronted with the issue by a SEO agency, I did not even notice. They put it in on the list of things to improve. So this is where I am now, the customer just has the info that there is something wrong here. Of course I would be happy if this could be improved, so that I can mark this issue as solved with the customer.
Please just let me know if you're going to look into it or not. Thanks!
Definitely going to address it!
This has been addressed in the above commits.
You can try it now by setting your semver in your composer.json
to look like this:
"nystudio107/craft-seomatic": "dev-develop as 3.3.36”,
Then do a composer update
I installed the dev version as 3.3.36, but only half of it works for me:
I did clear all SEOmatic caches. I also recreated all sitemaps, just in case this is also needed for robots.txt
This is due to a cached template, which should have been propagated, but for some reason was not.
Can you look at the seomatic_metabundles
database table, and tell me what the version numbers of the rows with __GLOBAL_BUNDLE__
are?
I only have __GLOBAL_BUNDLE__
rows (missing META), and if you mean the bundleVersion
column, this is 1.0.47 for those.
none of them were updated beyond 1.0.47? It should lazily update the meta bundles, so clearing caches and then visiting the pages/sites should cause the meta bundles to update.
which caches should I clear? only seomatic or all?
Clear the SEOmatic caches, then on the frontend of the website, visit a page of each Site and it should lazily update the bundles.
I will verify on my end as well.
I did, nothing changed. Cleared SEOmatic caches, visited 4 different langues, bundleVersion is still 1.0.47 I assume with lazily you mean immediately after I visited the Frontend pages, not some time later, right?
Yes. I will verify on my end as well. This is the fix commit: https://github.com/nystudio107/craft-seomatic/commit/332b67a66c2bac4f6398876514960fadc2ec7de2
@hiasl confirmed there was a regression that could cause the metabundles to not update; fixed in https://github.com/nystudio107/craft-seomatic/commit/5a77c461d511fb9c473624de0b4e3a4aa52d2c76
You can try it now by setting your semver in your composer.json
to look like this:
"nystudio107/craft-seomatic": "dev-develop as 3.3.36”,
Then do a composer clear-cache && composer update
Here's what it looks like for me in local dev:
# robots.txt for http://localhost:8000/
sitemap: http://localhost:8000/sitemaps-1-sitemap.xml
sitemap: http://localhost:8000/es/sitemaps-1-sitemap.xml
# local - disallow all
User-agent: *
Disallow: /
robots.txt still stays the same (only one sitemap). But bundleVersion increase now.
What I did:
- Upgrading nystudio107/craft-seomatic (dev-develop 2392c4f => dev-develop e02992e)
These are my site settings:
Okay so I went down the rabbit hole to figure out what was going wrong here, and it turns out this is expected behavior... but I'm open to discussion about whether this is good behavior or not.
So the frontend templates like robots, humans, ads, etc. allow you to edit them in the CP. In our case, under SEOmatic -> Global SEO -> Robots
When updating the meta containers, it preserves any data that is user-editable. If we didn't do this, then someone who had customized their Robots.txt
would have it blown away by the update.
So in order for the fix to fully propagate here on existing sites, you'd need to manually paste the context of the craft-seomatic/src/templates/_frontend/pages/robots.twig
into your Global settings for each site. Here it is:
# robots.txt for {{ siteUrl }}
{{ seomatic.helper.siteGroupSitemaps() }}
{% switch seomatic.config.environment %}
{% case "live" %}
# live - don't allow web crawlers to index cpresources/ or vendor/
User-agent: *
Disallow: /cpresources/
Disallow: /vendor/
Disallow: /.env
Disallow: /cache/
{% case "staging" %}
# staging - disallow all
User-agent: *
Disallow: /
{% case "local" %}
# local - disallow all
User-agent: *
Disallow: /
{% default %}
# default - don't allow web crawlers to index cpresources/ or vendor/
User-agent: *
Disallow: /cpresources/
Disallow: /vendor/
Disallow: /.env
Disallow: /cache/
{% endswitch %}
I'm open to input on how to handle this. On the one hand, not having it just automatically update to fix the issue is confusing. On the other hand, blowing away data that the user may have customized is likely worse.
I can confirm this is now working with {{ seomatic.helper.siteGroupSitemaps() }}
Thanks a lot!
One little thing: {{ seomatic.helper.siteGroupSitemaps() }}
outputs the leading word "sitemap:" in lowercase. I do not know if search engines are picky about that, but it's normally written with a capital "S" at the beginning of "Sitemap".
Regarding the propagation of this fix, I have 2 ideas:
1.) My favourite: Why don't you deprecate {{ seomatic.helper.sitemapIndexForSiteId() }}
and {{ seomatic.helper.siteGroupSitemaps() }}
and just call it {{ seomatic.helper.sitemapIndex() }}
.
This new method outputs ALL sitemaps within the same domain, ignoring any additional paths. This works even across site groups which is totally ok since there can always only be one robots.txt per domain. This would cover all cases:
For this idea you could even content migrate the robots.txt templates and replace
Sitemap: {{ seomatic.helper.sitemapIndexForSiteId() }}
with {{ seomatic.helper.sitemapIndex() }}
.
And I guess it will not have any negative effects on existing installations.
2.) A less invasive idea would be to content migrate the robots.txt
template field SEOmatic's global settings and replace
{{ seomatic.helper.sitemapIndexForSiteId() }}
with
{{ seomatic.helper.sitemapIndexForSiteId() }}
{# use this for multisites in single domains {{ seomatic.helper.siteGroupSitemaps() }} #}
if {{ seomatic.helper.siteGroupSitemaps() }}
is not part of the field yet... But not sure if this is really good.
Yeah I checked the spec, and lowercase is actually what they list, but either is fine.
I like your ideas for the migration, the thing that's bothering me is there's no real place for it currently. I'd need to special-case for this particular template, which feels a little gross.
I'll see if I can't come up with something more general, and in the meantime, a manual update isn't the end of the world.
Found a decent vector:
seomatic.helper.siteGroupSitemaps() -> seomatic.helper.sitemapIndex() -> https://github.com/nystudio107/craft-seomatic/commit/2e80b7f35435a02cc5b4298f4e5a20e4c37e61b5
Swap in the new robots.txt sitemaps -> https://github.com/nystudio107/craft-seomatic/commit/3059c8c6da2c1169573e1a396dfe9fb3ab27b905
FYI I was reading the spec you sent over https://developers.google.com/search/docs/advanced/robots/robots_txt
And notice it says robots.txt in sub folders are not valid.
That would suggest that there should only be the very first .com/robots.txt ??
@OwenMelbz It's not doing this anymore, please see the changes above, which are live.
It's also a very minor issue, given that the sitemaps were all available from the root, with proper hreflang
links, and the individual sitemaps were submitted to Google/Bing automatically regardless.
@khalwat I'd updated our robots.txt config with the new stub from github, then cleared all the caches - but we're still getting a 404 for /robots.txt since updating e.g. https://www.fsifm.com/robots.txt Any thoughts?
Likely to be a server-side configuration issue. Track it down in the logs.
The 404 is unlikely to be related to this issue.
There's no errors in the logs at all, doesn't seomatic register the route?
Like it does for the sitemaps, which work.
This also worked before the update, no server configurations have changed. Just a composer install and project config/apply
Probably a new issue should be filed for this, but assuming nothing else has changed other than the update to SEOmatic (and it worked before), then it's likely failing here:
https://github.com/nystudio107/craft-seomatic/blob/v3/src/services/FrontendTemplates.php#L90
So ensure that your site:
1 - has a Base URL set 2 - the Base URL does not have a sub-directory as part of it
Question
I'm using Craft 3.4.30 with Seomatic 3.3.35
Our setup is multisite, 2 site groups (2 different websites with 2 domains), 6 languages in each site group. NOT headless. The primary language (en) is served from the root of each domain https://domain/ and https://domain2/, the other 5 languages from a URL segment/directory with the ISO code of each language, e.g.
The problem: SEOmatic creates separate robots.txt for each multisite/language, although they share the same domain. So I get a
In my point of view SEOmatic's behavior is wrong, there should be only 1 robots.txt per domain linking to ALL sitemaps in all languages. I do not think Search Engines will try to look for a robots.txt in each language directory /de/robots.txt, /fr/robots.txt, ... If there were other languages served from e.g. subdomains, there should be its own robots.txt.
Please let me know what you think and if you agree, please try to find a solution. My suggestion is to make sure that there is only one robots.txt per domain, which should contain all relevant links for all multisites within that domain.