nystudio107 / craft-seomatic

SEOmatic facilitates modern SEO best practices & implementation for Craft CMS 3. It is a turnkey SEO system that is comprehensive, powerful, and flexible.
https://nystudio107.com/plugins/seomatic
Other
165 stars 70 forks source link

hreflang lowercased in sitemap alt url hreflang tag #1040

Closed adamseabrook closed 2 years ago

adamseabrook commented 2 years ago

Describe the bug

With the Sitemap Alt URLs option turned on it adds all the alternate language versions of the page to the sitemap. A Craft Site Language Setting of pt-BR shows as pt-br in the sitemap (have to view source to see it). I know that hreflang isn't case sensitive but it throws off the site audit crawl Ahrefs does as it shows as two alternate pages when it is really just the one. It also causes issues with some other audit tools that are case sensitive.

To reproduce

  1. Create a Craft site that has an alternate of pt-BR or any language and region combination.
  2. Enable Sitemap Alt URLs
  3. Generate sitemap
  4. View sitemap in browser and view source.

Expected behaviour

Hreflang case set at the Craft Site level should be respected.

Screenshots

For example https://www.zarla.com/sitemaps-1-section-categoryDirectoryListing-3-sitemap.xml

  | <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="sitemap.xsl"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>https://www.zarla.com/guides/logo-ideas</loc><lastmod>2021-12-09T09:50:50-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority><xhtml:link rel="alternate" hreflang="en" href="https://www.zarla.com/guides/logo-ideas" /><xhtml:link rel="alternate" hreflang="es" href="https://www.zarla.com/es/guías/ideas-de-logo" /><xhtml:link rel="alternate" hreflang="it" href="https://www.zarla.com/it/guide/idee-per-loghi" /><xhtml:link rel="alternate" hreflang="fr" href="https://www.zarla.com/fr/guides/idées-de-logo" /><xhtml:link rel="alternate" hreflang="de" href="https://www.zarla.com/de/leitfäden/logo-ideen" /><xhtml:link rel="alternate" hreflang="pt-br" href="https://www.zarla.com/br/guias/ideias-de-logotipos" /><xhtml:link rel="alternate" hreflang="nl-nl" href="https://www.zarla.com/nl/handleidingen/logo-ideeën" /><xhtml:link rel="alternate" hreflang="tr-tr" href="https://www.zarla.com/tr/rehberler/logo-fikirleri" /><xhtml:link rel="alternate" hreflang="pl-pl" href="https://www.zarla.com/pl/wskazówki/projekty-logo" /><xhtml:link rel="alternate" hreflang="id-id" href="https://www.zarla.com/id/panduan/ide-logo" /></url></urlset>
-- | --

Should be pt-BR, nl-NL, tr-TR, pl-PL, and id-ID.

Versions

khalwat commented 2 years ago

So as you mention, hreflang links are not case sensitive. Can you help me understand where the pt-BR is ever output?

SEOmatic should be consistent, in terms of outputting both <link rel="alternate" hreflang="pt-br"> as well as the hreflang link in the sitemap being "pt-br".

I'm a little confused as to how Ahrefs shows multiple localizations of the same page, that being the case?

Is the mixed case variant being output somewhere/somehow?

adamseabrook commented 2 years ago

SEOmatic does not seem to output the hreflang in different cases. All it is doing is lowercasing the "proper" hreflang that Craft shows in the Language drop down in the settings > sites section. They show it as xx-XX and that does seem to be the way most sites show it vs xx-xx.

So not really a bug but for consistency I feel it should respect the casing set by Craft and not lowercase it. I actually rarely see the hreflang lowercased on large multi-lingual sites. Struggling to actually find an example of lowercase.

Ahrefs should be ignoring the case but they do not. Instead they show it like this for https://www.zarla.com/privacy-policy where they report the xx-XX in the source and then additional for the SEOmatic generated sitemaps as xx-xx. I plan on logging this bug with them also. image

khalwat commented 2 years ago

Google uses all lowercase in a number of their examples:

https://developers.google.com/search/docs/advanced/crawling/localized-versions

Link: <https://example.com/file.pdf>; rel="alternate"; hreflang="en",
      <https://de-ch.example.com/file.pdf>; rel="alternate"; hreflang="de-ch",
      <https://de.example.com/file.pdf>; rel="alternate"; hreflang="de"

The official spec for the tag also mentions that it is case-insensitive so... I'm not really sure I'd qualify this as a bug in SEOmatic.

In contrast, SEOmatic does explicitly uppercase the regional code for OpenGraph:

https://github.com/nystudio107/craft-seomatic/blob/develop/src/helpers/Localization.php#L27

I feel like this is a Ahrefs issue, it should be treating the tag attributes as case insensitive. I'd be open to revisiting this if they come back with something to the contrary, though.