onlyOnRoot redirecting prefix routes

Perezmarc commented 3 years ago

Version

"nuxt": "^2.14.7", "nuxt-i18n": "^6.15.4"

Nuxt configuration

mode:

[X] universal
[] spa

Nuxt-i18n configuration

"nuxt-i18n",
{
        strategy: "prefix_and_default",
        locales: [
          {
            name: "English",
            code: "en",
            iso: "en-US"
          },
          {
            name: "Español",
            code: "es",
            iso: "es-ES"
          }
        ],
        vueI18n: {
          fallbackLocale: "en",
          messages: {
            en: require("./locale/en.json"),
            es: require("./locale/es.json")
          },
          silentTranslationWarn: false
        },
        defaultLocale: "es",
        seo: false,
        baseUrl: "https://getsilt.com",
        detectBrowserLanguage: {
          useCookie: true,
          cookieKey: "i18n_redirected",
          onlyOnRoot: true
        }
      }

Reproduction Link

https://getsilt.com/es

Steps to reproduce

Go to https://getsilt.com/es and it will always be redirected to the root.

What is Expected?

https://getsilt.com/es Should keep its "es". The idea is to get the result of prefix_and_default, so have both getsilt.com and getsilt.com/es, and obviously have translated pages crawled by Google.

What is actually happening?

https://getsilt.com/es redirects to https://getsilt.com/es

Google is not crawling the translated page "/es", and I think it is because it always redirects to "/". Now the "prefix" page, is only available under "/en".

Which makes google to not index "/es" route, and therefore no translated pages are indexed. The root are indexed in English, which I think is expected. These are the headers in https://getsilt.com/:

<html data-n-head-ssr="" lang="en-US" data-n-head="%7B%22lang%22:%7B%22ssr%22:%22en-US%22%7D%7D"><!--<![endif]--><head>
<meta data-n-head="ssr" data-hid="og:locale" property="og:locale" content="en_US" />
<meta data-n-head="ssr" data-hid="og:locale:alternate-es-ES" property="og:locale:alternate" content="es_ES" />
<link data-n-head="ssr" rel="icon" type="image/x-icon" href="/favicon.ico">
<link data-n-head="ssr" data-hid="alternate-hreflang-en" rel="alternate" href="https://getsilt.com/en" hreflang="en" />
<link data-n-head="ssr" data-hid="alternate-hreflang-en-US" rel="alternate" href="https://getsilt.com/en" hreflang="en-US" />
<link data-n-head="ssr" data-hid="alternate-hreflang-es" rel="alternate" href="https://getsilt.com/" hreflang="es" />
<link data-n-head="ssr" data-hid="alternate-hreflang-es-ES" rel="alternate" href="https://getsilt.com/" hreflang="es-ES" />

rchl commented 3 years ago

With prefix_and_default strategy the default locale has no prefix. Maybe you want to use prefix strategy if you want all locales to have a prefix?

Perezmarc commented 3 years ago

The idea is to get the result of prefix_and_default, so have both getsilt.com and getsilt.com/es, and obviously have translated pages crawled by Google.

But Google is not crawling the translated page "/es", and I think it is because it always redirects to "/". Now the "prefix" page, is only available under "/en".

Does it make sense now? :)

divine commented 3 years ago

The idea is to get the result of prefix_and_default, so have both getsilt.com and getsilt.com/es, and obviously have translated pages crawled by Google.

Nope, this won't happen. Google will just decide on it's own what page url (/ or /es) to select as duplicate content and excluding it (if not both of them).

Basically, search engines doesn't like duplicate content at all. They simply can't decide which is original and giving a chance to them will only harm. Changes are not instant and correction might take a months (speaking from personal experience).

But Google is not crawling the translated page "/es", and I think it is because it always redirects to "/".

This is actually correct - in prefix_and_default strategy redirecting to root. You've / and /es with same content and / should be the main.

Does it make sense now? :)

Absolutely. I don't see the issue, however misunderstanding of SEO and what has been done over here to improve it.

Thanks!

Perezmarc commented 3 years ago

Basically, search engines doesn't like duplicate content at all. They simply can't decide which is original and giving a chance to them will only harm. Changes are not instant and correction might take a months (speaking from personal experience).

That's true, but since Google is crawling "/" as english version, with onlyOnRoot: true, "/es" should not be equal to "/" right?

I'm not sure I got what you meant... The issue is that currently there are no pages crawled in spanish, neither "/" or "/es" How could I make Google crawl any page both in spanish and english having strategy: 'prefix_and_default'. I thought that using onlyOnRoot: true should be enough.

rchl commented 3 years ago

Are you able to find out what headers does google crawler sends when visiting the / page?

If it explicitly sends "Accept-language: en" (or similar) then / would redirect to /en indeed. That's one of the drawbacks of automatic browser language detection, even with onlyOnRoot.

Perezmarc commented 3 years ago

As I understand from this Link, Googlebot does not provide an Accept-language header.

But still, I thought that the purpose of onlyOnRoot was to redirect only for the root page, as de documentation defines.

onlyOnRoot (default: false) - Set to true (recommended for improved SEO) to only attempt to detect the browser locale on the root path (/) of the site. Only effective when using strategy other than 'no_prefix'.

I think that there's somehow a redirect from "/es" to "/".
prefix_and_default should make both pages available right? I think that the only issue is that https://getsilt.com/es/ redirects to https://getsilt.com and it all makes the crawling of spanish unavailable. But I don't know why that's happening 🤷‍♂️

Perezmarc commented 3 years ago

I may be mistaken... Somehow I found that the spanish sites are crawled: Google search

The actual issue, is that Google is not structuring the results grouping them by language as expected. Having the browser language preference set to spanish, the results are in english first, and the websites crawled are not structured as expected.

I discovered that the issue is that the 4th of november "/es" version crawled does not have a canonical link, but the "live version (released on 25th Nov) it actually has a proper canonical link (tanks to the latest version of nuxt-i18n <3 ).

I'll close this issue, since I think it is a matter of waiting to get crawled again... (The option of reindex in search console being "temporary" deactivated does not help either :( )

Thank you all, and apologies for any disturbance created.

divine commented 3 years ago

@Perezmarc I've just checked it up.

https://getsilt.com/es returns empty content, this is probably related and fixed in https://github.com/nuxt-community/i18n-module/pull/989

Can you create a simple reproduction? I can't reproduce it locally.

Thanks!

theDevelopper commented 3 years ago

This ticket should be re-opened. I don't think this is solved for static generated sites. While now the pages that are redirected away from are no longer empty (which was an important fix) it still redirects for static sites.

Let's break it down. The relevant configuration properties are these:

strategy: "prefix_and_default",
detectBrowserLanguage: {
          useCookie: true,
          onlyOnRoot: true
}

Accoridng to the documentation the Strategy prefix_and_default is defined as such:

This strategy combines both previous strategies behaviours, meaning that you will get URLs with prefixes for every language, but URLs for the default language will also have a non-prefixed version.

This is fulfilled in the sense that both prefixed and non-prefixed pages are generated as HTML files during build and have the correct content.

The detectBrowserLanguage onlyOnRoot property is defined as:

With it set, the language detection is only attempted when the user visits the root path (/) of the site. [...] It also allows linking to pages in specific locales.

Ignoring Crawler and SEO for simplicity as they are just one use-case but not the cause then this means that if I hit a static hosted website it should behave as follows ind regards of language detection alone: www.domain.tld/ -> this is root, so attempt to detect language (if no cookie is set) www.dotmain.tld/en -> this is NOT root as we have a path, do not attempt to detect language, but set a cookie with language EN; overwrite if exists

for language detection only, the stategy is not relevant, but the misbehaviour gets clear when combining them both. This si what should happen when using prefix_and_default with onlyOnRoot detectBrowserLanguage: www.domain.tld/ -> this is root, so attempt to detect language (if no cookie is set) -> REDIRECT if detected language does not match default language www.dotmain.tld/en -> this is NOT root as we have a path, do not attempt to detect language, but set a cookie with EN as language -> NO REDIRECT www.dotmain.tld/en/page -> this is NOT root as we have a path, do not attempt to detect language, but set a cookie with EN as language -> NO REDIRECT

This is hat actually happens: www.domain.tld/ -> this is root, so attempt to detect language (if no cookie is set) -> REDIRECT if detected language does not match default language www.dotmain.tld/en -> this is NOT root as we have a path, do not attempt to detect language, but set a cookie with EN as language -> REDIRECTS TO www.domain.tld/ if EN is default language www.dotmain.tld/en/page -> this is NOT root as we have a path, do not attempt to detect language, but set a cookie with EN as language -> REDIRECTS TO www.domain.tld/page if EN is default language

This behaviour is closer to strategy prefix_except_default, as defined:

Using this strategy, all of your routes will have a locale prefix added except for the default language

Why? because I have no chance of getting to www.dotmain.tld/en/page without being redirected; if EN is the default locale.

divine commented 3 years ago

This ticket should be re-opened. I don't think this is solved for static generated sites.

Hello,

Could you please try onlyOnNoPrefix option instead of onlyOnRoot that was introduced in https://github.com/nuxt-community/i18n-module/pull/896 and released in 6.16?

https://i18n.nuxtjs.org/options-reference#detectbrowserlanguage

Thanks!

Perezmarc commented 3 years ago

Thank you @divine, yes, I think this kind of works. But still, I think that prefix_and_default may not be a good strategy for SEO, at least in my case.

I have defaultLocale: "es", but the crawled version of / is actually in English.

I'd like to have the following:

route / --> always in defaultLocale
- if language is en, redirect to /en
- if language is es no redirect
route /es or /en --> never redirect

Desired Crawled versions:

/ language in defaultLocale: es
/es language in es
/en language in en

Actual Crawled versions:

/ language in Google bot browserLanguage: en
/es language in es
/en language in en -> Not indexed because duplicated content

Is that possible? Does it make sense? What do you think it would be the best strategy for SEO? right now I'm seeing my google search results in multiple languages: https://ibb.co/n1mKd2N

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

divine commented 3 years ago

Hello @Perezmarc,

Not sure how I missed your message.

But still, I think that prefix_and_default may not be a good strategy for SEO, at least in my case.

The best strategy for SEO is prefix & prefix_except_default

right now I'm seeing my google search results in multiple languages: https://ibb.co/n1mKd2N

It has been almost 6 months, what did you choose and what results do you have?

Also, this issue can be closed?

Thanks!

Perezmarc commented 3 years ago

Hi! thanks for reaching back again! I have been using prefix for a few months now. I still see indexed / in English /es in Spanish and the same results as the previous image https://ibb.co/n1mKd2N In search console though, it says that root / is not indexed, the canonical for root is /en, and the canonical for /es is /es.

Still not positioning properly my site with the google maps and the proper language. It should not show first the page in English when searching in spanish.

divine commented 3 years ago

Still not positioning properly my site with the google maps and the proper language. It should not show first the page in English when searching in Spanish.

Hello @Perezmarc! Glad to see you here again!

This happens only on a statically generated version because redirection to /en happens via JavaScript after a full page load.

I'm having the same issue, but the only solution is to use the target: 'server' version.

Thanks!

Perezmarc commented 3 years ago

That's a shame, I decided to use nuxt for the static site generation :S Thanks anyway for your response

sintj commented 2 years ago

Hi,

is this problem about nuxt-i18n still open? There is no clean solution yet? Even with onlyOnRoot it's a semi-solution cause search engines will index the homepage only in English. Need to disable the auto redirect in order to get every page indexed in every language i have.

rchl commented 2 years ago

I think there is no solution other than disabling auto-detection of the user's language.

Perezmarc commented 2 years ago

I think SEO and multilanguage is a must for a landing page, and I use Nuxt for the static site generator. This could be a problem that would make me choose to migrate to other solutions.

rchl commented 2 years ago

Disable auto-detection of the language then.

If you think it's possible to support this case with auto-detection enabled, or you find that other solutions support that, then feel free to let me know.

I'm not asking for a solution implemented in the code but just a generic idea of how it should work.

Perezmarc commented 2 years ago

Hi @rchl I was thinking... have you tried by checking the user agent of the bots, and then only redirect if the ua is not in the list of ua bots?

LucianMihalache commented 2 years ago

@Perezmarc Sorry for bringing up such an old issue. I face the same problem. I am using auto-redirect to the browser language, but this is not desired for the crawlers because they always get redirected when landing on homepage.

Is there a way to disable the redirect based by the user agent of the bots? I've been looking through the code but I couldn't find anything about that.

rchl commented 2 years ago

It's not generally recommended to have crawler-specific logic. It can even lead to rank decrease in some cases.

Also I believe that crawlers on purpose make small amount of requests without identifying itself to catch cases where they are treated differently.

So I'm not sure it's a good idea.

LucianMihalache commented 2 years ago

@rchl so the only solution is to disable the detectBrowserLanguage ?

nuxt-modules / i18n