Open longlho opened 3 years ago
this might be something formatjs can provide actually
@Timer I believe u can use the ResolveLocale
abstract operation from ECMA402 (spec). We do have an implementation in @formatjs/ecma402-abstract
: https://codesandbox.io/s/affectionate-chatelet-nvgl7?file=/src/App.tsx
ResolveLocale(
["fr", "en"],
["fr-XX", "en"],
{ localeMatcher: "best fit" },
[],
{},
() => "en"
)
should yield
{
"locale":"fr",
"dataLocale":"fr"
}
I have the same issue here... very very annoying.
I've just encountered this in a real world scenario, I have Italian set up as an it
subpath. Desktop browsers set to Italian seem to match it
. Safari on iOS only sends it-it
in their headers, this results on the page falling back to English.
Here's a link: https://i18n.wscandy.vercel.app/
@ijjk You have assigned yourself to this issue, but to save you time, I tried to understand how NextJS took into account the Accept-Language header, and realized that this part was managed by the hapi/accept module: https://github.com/vercel/next.js/blob/master/packages/next/next-server/server/next-server.ts#L328
So, I've written a small script that could replace the accept.language
method:
const regex = /((([a-zA-Z]+(-[a-zA-Z0-9]+){0,2})|\*)(;q=[0-1](\.[0-9]+)?)?)*/g;
function parse(al){
const strings = (al || "").match(regex);
return strings.map(m => {
if(!m){
return;
}
const bits = m.split(';');
const ietf = bits[0].split('-');
const hasScript = ietf.length === 3;
return {
code: ietf[0],
script: hasScript ? ietf[1] : null,
region: hasScript ? ietf[2] : ietf[1],
quality: bits[1] ? parseFloat(bits[1].split('=')[1]) : 1.0
};
}).filter(r => r).sort((a, b) => b.quality - a.quality);
}
function pick(supportedLanguages, acceptLanguage) {
const parsedAcceptedLanguage = parse(acceptLanguage);
const supported = supportedLanguages.map(support => {
const bits = support.split('-');
const hasScript = bits.length === 3;
return {
code: bits[0],
script: hasScript ? bits[1] : null,
region: hasScript ? bits[2] : bits[1]
};
});
for (let i = 0; i < parsedAcceptedLanguage.length; i++) {
const lang = parsedAcceptedLanguage[i];
const langCode = lang.code.toLowerCase();
const langRegion = lang.region ? lang.region.toLowerCase() : lang.region;
const langScript = lang.script ? lang.script.toLowerCase() : lang.script;
let possible = []
for (let j = 0; j < supported.length; j++) {
const supportedCode = supported[j].code.toLowerCase();
const supportedScript = supported[j].script ? supported[j].script.toLowerCase() : supported[j].script;
const supportedRegion = supported[j].region ? supported[j].region.toLowerCase() : supported[j].region;
if (langCode === supportedCode) {
possible.push({ lang: supportedLanguages[j], supportedScript, supportedRegion });
}
}
if (possible.length > 0) {
const regionsFilter = possible.filter(({ supportedRegion }) => supportedRegion == langRegion)
if (regionsFilter.length > 0) {
const scriptFilter = regionsFilter.filter(({ supportedScript }) => supportedScript == langScript)
if (scriptFilter.length > 0) {
return scriptFilter[0].lang
}
return regionsFilter[0].lang
} else {
return possible[0].lang
}
}
}
return null;
}
Which, once implemented/imported in the server (line 328) would resolve into:
pick(i18n.locales, req.headers['accept-language'])
I've tested my script on multiple cases, and I found it always took the right option (but I encourage to try it yourself). The only thing is that it works better by adding general language code before region/script specific languages: for example, to make sure en
is selected it has to be placed first:
const languages = ["en", "en-US", "en-GB"]
pick(languages, "en-AU,en-US;q=0.9,en;q=0.8")
@arguiot unfortunately your locale negotiation algorithm is not correct. There are currently 2 algorithms:
RFC4647 is pretty rudimentary while UTS35 is more sophisticated and handle legacy aliases (e.g zh-TW
-> zh-Hant-TW
). libicu
uses UTS35 and that's also what formatjs
simulate as well.
@longlho Which part is wrong exactly? The links you gave aren't algorithm at all, they are specifications (which is quite different). Currently, the problem is not about parsing Accept-Language
header or selecting the most relevant locale, but a bit of both.
I'm not saying my algorithm is perfect, because it's not the case and not the goal. The problem with specifications is that not everyone follows them. The language picker has to be very flexible to make sure that it works for everyone.
There's room for improvement (so if you want to improve my work or someone else's work, please do). But I think it's better to quickly fix the issue and improve the language detection overtime. I believe it's always better to have something pretty reliable right now that will get better than having something absolutely perfect but in 3 months.
I'm not sure you understand locale negotiation. You need language matching data from CLDR to resolve them correctly. Just parsing locales and doing matching based on lang/region/script is not enough. The 2 algorithms are described in the 2 specs if you read them. "Not everyone following them" because they don't really understand its complexity and choose to just hack together solutions that work in the 20 tests they have.
I don't recommend just hacking a solution. Either use a library that follows the algorithm or implement the algorithm itself.
Formatjs already follows the spec so either use that or libicu
I probably don't understand locale negotiation as well as you. I would like to understand why parsing locales and doing matching based on lang/region/script is not enough in our case.
The problem with what you're proposing is that it's not related to Accept-Language
headers at all. Which basically means that in the ECMA402 implementation q
weightings aren't respected.
Moreover, we would have to parse the Accept-Language
ourself. I already solved the problem in my implementation.
I agree on the fact that it's better to use FormatJS, but we first need to solve the q
weightings problem first.
Finally, the ResolveLocale
functions makes bad decisions:
ResolveLocale(
["en-CA", "fr-CA"],
["fr-FR"],
{ localeMatcher: "best fit" },
[],
{},
() => "en"
)
In that case, it should return fr-CA
because it's the closest locale, but it returns en
.
Q weightings determines the order of the requested locale array.
And in your example, yes, it should return "en". You're assuming if you understand a language in a certain region then you understand that in all regions, which is not true for traditional vs simplified Chinese, or latin american spanish vs spain spanish.
Using lang/region/script is not enough. Because zh-TW matches with zh-Hant, not zh-Hans or zh and that data lives in CLDR.
@longlho I know what Q weightings are, but does the order matter in the ResolveLocale function?
This is a very tricky point, and I think we should let the developer decide. Maybe by letting the developer use its own resolver in next.config.js
? And make the ECMA402 resolver the default one.
The problem is that not every browser will send the header with the region and the code: Chrome, will do it but Safari will only send en-US
for example.
For global websites, having a perfect language resolver matters a lot. But for country specific website, like in Canada for example, only the language code matters.
I would like to have your opinion on this proposal. For me I think it would solve the problem (and close our little debate :smile:).
The order absolutely matters in ResolveLocale
. And tbh you should NOT let developers control the resolver bc it requires fairly extensive knowledge on how locale negotiation & CLDR data works, in conjunction with things like upgrading libicu/CLDR/unicode versions. Unless you have a dedicated team for it, it's not something that an average dev wants to deal with.
The problem is that not every browser will send the header with the region and the code: Chrome, will do it but Safari will only send en-US for example.
That's not entirely correct. Locale preferences are determined by both browser user settings and OSes. It contains more than just lang/script/region. If u read the UTS35 LDML spec it contains preferences like calendar, numbering systems and such (e.g ar-nu-latn
is Arabic using Latin numbering system instead of Arabic ones). Calendar/numbering systems can be pulled from the OS by the browser. Browsers can choose to send those or not, due to potential fingerprinting issue.
For global websites, having a perfect language resolver matters a lot. But for country specific website, like in Canada for example, only the language code matters.
Canada has 2 official languages: English & French. CA also uses Celsius instead of Fahrenheit. There are a lot of differences in en
vs en-CA
(number grouping, grouping symbol, currencies...). I guess I'm not sure what you meant here.
Having worked with large engineering org I have not seen the need to have "custom" i18n functionalities that are not encompassed by Unicode or ICU due to high learning curve and complexity, so providing something standard out of the box is the norm.
I must admit that you convinced me. I created a Pull Request (#19460) based on your code. It would be great if you could review the code and make sure everything is working properly.
Experiencing the same issue on a Next 10 sites using built-in localization. The client using iOS Safari (I can reproduce it on my iPad) is not redirected to /fr but keeps seeing the content of the site with the default language (en). Works on any other browser though.
~No "en" is correct. en-US is just an alias for en~
Sorry I misread your function, turns out it was using an older version of ResolveLocale
, the latest version should resolve to zh-CN
. I've updated the codesandbox as well.
same issue here.
i18n: {
locales: ['ko', 'en', 'fr', 'nl'],
defaultLocale: 'en',
}
i'm prefer using in ko-kr. each browser sent difference format in Accept-Language Header like this.
os x, chrome ✅ Accept-Language: ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7
os x, safari ❌ Accept-Language: ko-kr
ios, chrome ❌ Accept-Language: ko-kr
windows 10, edge ✅ Accept-Language: ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7
windows 10, chrome ✅ Accept-Language: ko,en;q=0.9,en-US;q=0.8
windows 10, internet explorer 11 ❌ Accept-Language: ko-kr
Except OS X Chrome, all browsers are sending not include region code.
Temporally, I put a 'ko-KR' also.
i18n: {
locales: ['ko','ko-KR','en', 'fr', 'nl'],
defaultLocale: 'en',
}
but really annoying..
Just pick the prefix language letter and match it. please.
'en-US' => 'en' 'en' => 'en' 'en-WHATEVER' => 'en' 'en-MARS🚀' => 'en'
'zh-TW'=> 'zh' 'zh-HK'=> 'zh' 'zh-WHATEVER' => 'zh'
'ko-KR' => 'ko' 'ko-KP' => 'ko' 'ko-WHATEVER' => 'ko'
Like I mentioned, zh-TW
-> zh
would not be correct. zh
is alias for Traditional Chinese where zh-TW
or zh-HK
is Simplified Chinese.
any updates?
any updates?
Is this being fixed? I think I'm running into the same issue with Safari on iOS when its set to spanish its not picking up the "es" locale. Works just fine in Chrome though.
Is this being fixed? I think I'm running into the same issue with Safari on iOS when its set to spanish its not picking up the "es" locale. Works just fine in Chrome though.
Hi! Do you know, there is a solution for this issue? script? Nexts.js new version? A lot of thanks!
Given that it seems that this issue isn't gonna resolved anytime soon: did somebody at least find a workaround? Some way to hijack the locale negotiation routine maybe?
I wish, I just ended up adding a language button to the bottom of my site which lets you flip from english to spanish.
Its amazing there is no work being done to fix this, if this was more than a personal site i’d consider is a major issue.
@hohl technically it is possible if you don't use built in next.js locale routing and just read directly from headers to negotiate
I already solved the problem in my implementation.
@arguiot How did you solve it?
Issue also presents itself in Firefox. I surely thought this was me not using Next.js properly. It's super odd that this bug has been allowed to persist since i18n was implemented in Next.js.
facing same issue here,
@longlho do you know what happened with your PR? https://github.com/vercel/next.js/pull/19552 that looks good to me, I'll try to patch-package it and that should be good workaround for now
the PR is pretty old but that's the general idea. If you're using vercel the issue is upstream routing at the CDN level so that patch won't help.
I can't tell if this has been reported yet, but it seems this is happening specifically with dynamic routes. If you create a new next.js project with a single pages/index.js
file it works correctly. But as soon as I add a pages/[id].js
it starts reporting the locale
as whatever defaultLocale
I have in my next.config.js
Since this seems to be related to the routing, it would be nice to have an option to use localeDetection: true
and something like localeRedirection: false
Any updates?
Safari does NOT work for us pushing the locale into router.push it always goes back to english. Chrome works just fine. using next v 11.1.2
Here is a simple expression of the problem:
$ curl 'https://mywebsite' -H 'Accept-Language: fr-FR'
Redirecting to /en (308)
$ curl 'https://mywebsite' -H 'Accept-Language: fr'
Redirecting to /fr (307)
One possible workaround is to include all locales, like so:
const es = [
"es",
"es-AR",
"es-BO",
"es-CL",
"es-CO",
"es-CR",
"es-DO",
"es-EC",
"es-ES",
"es-GT",
"es-HN",
"es-MX",
"es-NI",
"es-PA",
"es-PE",
"es-PR",
"es-PY",
"es-SV",
"es-UY",
"es-VE"
];
const fr = ["fr", "fr-BE", "fr-CA", "fr-CH", "fr-FR", "fr-LU", "fr-MC"];
const de = ["de", "de-AT", "de-CH", "de-DE", "de-LI", "de-LU"];
const nl = ["nl", "nl-BE", "nl-NL"];
It creates a massive overhead for static page generation though.
I have tested multiple production next websites and unfortunately all of them have this flaw.
It is quite sad that this is a year old issue, I guess it is under reported because it is hard to spot and real users most likely would not report such UX issue.
@timuric Our bloody workaround was to use _app, add a small hook so there is client-side redirection from / to /de when accpt-language resolves to de* and route is /. But of course it has major downsides
@pozylon
@timuric Our bloody workaround was to use _app, add a small hook so there is client-side redirection from / to /de when accpt-language resolves to de* and route is /. But of course it has major downsides
Haha, we also tried that, but unfortunately it is not bullet proof :(
So far adding all possible locales does the job, but as I mentioned there is a massive overhead that can be a deal breaker for some static paths with many permutations.
@timuric We use SSG for a webshop and I'd wait 2h for a build if I did that haha ^^
Well it's all crap we need a next-native solution, I'd prefer beeing able to define a custom locale resolver as a function:
{
i18n: {
resolveLocale(ctx) { return ctx.req.headers?["X-Startreck"] ? "klingon" : null }
}
}
you guys can use https://formatjs.io/docs/polyfills/intl-localematcher to manually match locales
any update on this?
I had this same problem. My workaround is to do my _middleware file as this:
import { NextRequest, NextResponse } from 'next/server'
import { ACCEPTED_LOCALES, DEFAULT_LANGUAGE } from '../core/constants/languages'
const PUBLIC_FILE = /\.(.*)$/
export function middleware(req: NextRequest) {
const shouldHandleLocale =
!PUBLIC_FILE.test(req.nextUrl.pathname) &&
!req.nextUrl.pathname.includes('/api/') &&
req.nextUrl.locale === 'default'
if (shouldHandleLocale) {
const language = req.cookies['NEXT_LOCALE'] ?? getBrowserLanguage(req) ?? DEFAULT_LANGUAGE
const url = req.nextUrl.clone()
url.pathname = `/${language}${req.nextUrl.pathname}`
return NextResponse.redirect(url)
}
return undefined
}
// Chrome automaticaly adds fallback to generic languages
// so this is not called when it is the used browser
const getBrowserLanguage = (req: NextRequest) => {
return req.headers
.get('accept-language')
?.split(',')
.map((i) => i.split(';'))
?.reduce(
(ac: { code: string; priority: string }[], lang) => [
...ac,
{ code: lang[0], priority: lang[1] }
],
[]
)
?.sort((a, b) => (a.priority > b.priority ? -1 : 1))
?.find((i) => ACCEPTED_LOCALES.includes(i.code.substring(0, 2)))
?.code?.substring(0, 2)
}
My ACCEPTED_LOCALES variable = ['en', 'fr', 'pt']
I had this same problem. My workaround is to do my _middleware file as this:
import { NextRequest, NextResponse } from 'next/server' import { ACCEPTED_LOCALES, DEFAULT_LANGUAGE } from '../core/constants/languages' const PUBLIC_FILE = /\.(.*)$/ export function middleware(req: NextRequest) { const shouldHandleLocale = !PUBLIC_FILE.test(req.nextUrl.pathname) && !req.nextUrl.pathname.includes('/api/') && req.nextUrl.locale === 'default' if (shouldHandleLocale) { const language = req.cookies['NEXT_LOCALE'] ?? getBrowserLanguage(req) ?? DEFAULT_LANGUAGE const url = req.nextUrl.clone() url.pathname = `/${language}${req.nextUrl.pathname}` return NextResponse.redirect(url) } return undefined } // Chrome automaticaly adds fallback to generic languages // so this is not called when it is the used browser const getBrowserLanguage = (req: NextRequest) => { return req.headers .get('accept-language') ?.split(',') .map((i) => i.split(';')) ?.reduce( (ac: { code: string; priority: string }[], lang) => [ ...ac, { code: lang[0], priority: lang[1] } ], [] ) ?.sort((a, b) => (a.priority > b.priority ? -1 : 1)) ?.find((i) => ACCEPTED_LOCALES.includes(i.code.substring(0, 2))) ?.code?.substring(0, 2) }
My ACCEPTED_LOCALES variable = ['en', 'fr', 'pt']
My test results with this workaround (en default language, browsers languages in this order: fr-CH, pt-BR, en-US)
http://localhost:3000/ => http://localhost:3000/fr/ - language found in getBrowserLanguage function http://localhost:3000/test-localize/ => http://localhost:3000/fr/test-localize/ - language found in getBrowserLanguage function
http://localhost:3000/ => http://localhost:3000/fr/ - language fr sent by chrome, no call to getBrowserLanguage http://localhost:3000/test-localize/ => http://localhost:3000/fr/test-localize/ - language found in getBrowserLanguage function
$ curl 'https://mywebsite' -H 'Accept-Language: fr-FR'
Redirecting to /en (308) 🫣
$ curl 'https://mywebsite' -H 'Accept-Language: fr'
Redirecting to /fr (307) 🙂
$ curl 'https://mywebsite/path' -H 'Accept-Language: fr'
No redirect (uses default locale) 🫣
Expected: redirecting to /fr/path
Hi, any updates on this?
Edit by maintainer bot: Comment was automatically minimized because it was considered unhelpful. (If you think this was by mistake, let us know). Please only comment if it adds context to the issue. If you want to express that you have the same problem, use the upvote 👍 on the issue description or subscribe to the issue for updates. Thanks!
Bug report
Describe the bug
If I send Accept-Language: "fr-XX,en" and I have available locales fr, en, it should give me fr instead of en (current behavior)
To Reproduce
Steps to reproduce the behavior, please provide code snippets or a repository:
Expected behavior
Should be routed to fr
Screenshots
If applicable, add screenshots to help explain your problem.
System information
Additional context
Add any other context about the problem here.