sveltejs / kit

web development, streamlined
https://svelte.dev/docs/kit
MIT License
18.74k stars 1.95k forks source link

i18n brainstorming #553

Open Rich-Harris opened 5 years ago

Rich-Harris commented 5 years ago

We've somewhat glossed over the problem of internationalisation up till now. Frankly this is something SvelteKit isn't currently very good at. I'm starting to think about how to internationalise/localise https://svelte.dev, to see which parts can be solved in userland and which can't.

(For anyone unfamiliar: 'Internationalisation' or i18n refers to the process of making an app language agnostic; 'localisation' or l10n refers to the process of creating individual translations.)

This isn't an area I have a lot of experience in, so if anyone wants to chime in — particularly non-native English speakers and people who have dealt with these problems! — please do.

Where we're currently at: the best we can really do is put everything inside src/routes/[lang] and use the lang param in preload to load localisations (an exercise left to the reader, albeit a fairly straightforward one). This works, but leaves a few problems unsolved.

I think we can do a lot better. I'm prepared to suggest that SvelteKit should be a little opinionated here rather than abdicating responsibility to things like i18next, since we can make guarantees that a general-purpose framework can't, and can potentially do interesting compile-time things that are out of reach for other projects. But I'm under no illusions about how complex i18n can be (I recently discovered that a file modified two days ago will be labeled 'avant-hier' on MacOS if your language is set to French; most languages don't even have a comparable phrase. How on earth do you do that sort of thing programmatically?!) which is why I'm anxious for community input.


Language detection/URL structure

Some websites make the current language explicit in the pathname, e.g. https://example.com/es/foo or https://example.com/zh/foo. Sometimes the default is explicit (https://example.com/en/foo), sometimes it's implicit (https://example.com/foo). Others (e.g. Wikipedia) use a subdomain, like https://cy.example.com. Still others (Amazon) don't make the language visible, but store it in a cookie.

Having the language expressed in the URL seems like the best way to make the user's preference unambiguous. I prefer /en/foo to /foo since it's explicit, easier to implement, and doesn't make other languages second-class citizens. If you're using subdomains then you're probably running separate instances of an app, which means it's not SvelteKit's problem.

There still needs to be a way to detect language if someone lands on /. I believe the most reliable way to detect a user's language preference on the server is the Accept-Language header (please correct me if nec). Maybe this could automatically redirect to a supported localisation (see next section).

Supported localisations

It's useful for SvelteKit to know at build time which localisations are supported. This could perhaps be achieved by having a locales folder (configurable, obviously) in the project root:

locales
|- de.json
|- en.json
|- fr.json
|- ru.json
src
|- routes
|- ...

Single-language apps could simply omit this folder, and behave as they currently do.

lang attribute

The <html> element should ideally have a lang attribute. If SvelteKit has i18n built in, we could achieve this the same way we inject other variables into src/template.html:

<html lang="%svelte.lang%">

Localised URLs

If we have localisations available at build time, we can localise URLs themselves. For example, you could have /en/meet-the-team and /de/triff-das-team without having to use a [parameter] in the route filename. One way we could do this is by encasing localisation keys in curlies:

src
|- routes
   |- index.svelte
   |- {meet_the_team}.svelte

In theory, we could generate a different route manifest for each supported language, so that English-speaking users would get a manifest with this...

{
  // index.svelte
  pattern: /^\/en\/?$/,
  parts: [...]
},

{
  // {meet_the_team}.svelte
  pattern: /^\/en/meet-the-team\/?$/,
  parts: [...]
}

...while German-speaking users download this instead:

{
  // index.svelte
  pattern: /^\/de\/?$/,
  parts: [...]
},

{
  // {meet_the_team}.svelte
  pattern: /^\/de/triff-das-team\/?$/,
  parts: [...]
}

Localisation in components

I think the best way to make the translations themselves available inside components is to use a store:

<script>
  import { t } from '$app/stores';
</script>

<h1>{$t.hello_world}</h1>

Then, if you've got files like these...

// locales/en.json
{ "hello_world": "Hello world" }
// locales/fr.json
{ "hello_world": "Bonjour le monde" }

...SvelteKit can load them as necessary and coordinate everything. There's probably a commonly-used format for things like this as well — something like "Willkommen zurück, $1":

<p>{$t.welcome_back(name)}</p>

(In development, we could potentially do all sorts of fun stuff like making $t be a proxy that warns us if a particular translation is missing, or tracks which translations are unused.)

Route-scoped localisations

We probably wouldn't want to put all the localisations in locales/xx.json — just the stuff that's needed globally. Perhaps we could have something like this:

locales
|- de.json
|- en.json
|- fr.json
|- ru.json
src
|- routes
   |- settings
      |- _locales
         |- de.json
         |- en.json
         |- fr.json
         |- ru.json
      |- index.svelte

Again, we're in the fortunate position that SvelteKit can easily coordinate all the loading for us, including any necessary build-time preparation. Here, any keys in src/routes/settings/_locales/en.json would take precedence over the global keys in locales/en.json.

Translating content

It's probably best if SvelteKit doesn't have too many opinions about how content (like blog posts) should be translated, since this is an area where you're far more likely to need to e.g. talk to a database, or otherwise do something that doesn't fit neatly into the structure we've outlined. Here again, there's an advantage to having the current language preference expressed in the URL, since userland middleware can easily extract that from req.path and use that to fetch appropriate content. (I guess we could also set a req.lang property or something if we wanted?)

Base URLs

Sapper (ab)used the <base> element to make it easy to mount apps on a path other than /. <base> could also include the language prefix so that we don't need to worry about it when creating links:

<!-- with <base href="de">, this would link to `/de/triff-das-team` -->
<a href={$t.meet_the_team}>{$t.text.meet_the_team}</a>

Base URLs haven't been entirely pain-free though, so this might warrant further thought.


Having gone through this thought process I'm more convinced than ever that SvelteKit should have i18n built in. We can make it so much easier to do i18n than is currently possible with libraries, with zero boilerplate. But this could just be arrogance and naivety from someone who hasn't really done this stuff before, so please do help fill in the missing pieces.

foxbunny commented 5 years ago

It seems that everyone in this thread is trying to make Sapper i18n opinionated with x or y translation library.

It's a result of i18n being one of those things that absolutely requires collective effort. Using something that's been accumulating such effort over longer periods of time seems like a good idea to me. Even if you say "I've had 95% success rate with the tools I've been using before!" that could only be because you did not encounter any languages where the success rate would drop to 30%.

To make everyone happy the best option would be to let developers have the choice, by finding a way to plug-in whatever translation function and its associated locale files.

That sounds great as long as there is also an official plug-in all juiced up and ready to go. I'm pretty sure the ease of integration will soon defeat the need for one little feature or another.

ignatevdev commented 5 years ago

I suggest you having a look at how js-lingui works with React. I have tried many different approaches in my old projects and lingui is absolutely the best.

https://github.com/lingui/js-lingui

tricoder42 commented 5 years ago

@NSLS Thank you!

I just want to mention that LinguiJS is in transition state - I'm working (once again) on major release. There're lot of obstacles in current approach.

Recently I've learned about Svelte and I really like the philosophy. Once I finish LinguiJS v3, I would like to take a look how to integrate it into Svelte.

dmitrykrylov commented 5 years ago

I created a minimal example of implementing i18n in Sapper app:

I used i18next because it has good docs and can be integrated easily. My first attempt was to use LinguiJS but I got into issues with rollup throwing errors when importing LinguiJS macros.

There are still issues with my implementation:

I would be glad to receive any pull requests and proposals to improve the example.

laurentpayot commented 5 years ago

For simple plural languages such as English or French you can have an i18n function with internal references (i.e. it can reuse translations) in 20 LOC only.

skaiser commented 5 years ago

Extremely new here. Just heard about Svelte today, but i18n would be important for adoption of Svelte in many large scale web apps/companies, so I thought I'd chime in since this doesn't seem totally solved/agreed upon yet. There are many things I don't like about Angular, but the syntactic way Angular handles i18n is nice, both in terms of readability of the HTML and also maintainability. You define your string in HTML as your normally would: <h1>This is my message</h1>

But, then you provide the i18n directive, which will choose the correct language at page load (The user only gets the translated bundle for their locale at page load): <h1 i18n>This is my message</h1>

You can, of course, use a variable for the string, when necessary. Having the text in the same file is convenient and may make finding random typos more likely.

It looks like @dogada's suggestion is the closest to this so far, and I would be ok with that approach. Although, having to call getMessage() seems unnecessary since Svelte is a compiler?

dogada commented 5 years ago

@skaiser getMessage is required when your translation depends on an argument, for example "You have 1 apple only" vs "There are 6 apples"

dogada commented 5 years ago

@dmitrykrylov your approach works but I personally don't like to use [lang] in routes because it's not always possible to change url structure and many people prefer to have /about/ instead of /en/about for the default English language.

alex7kom commented 5 years ago

Just to add to many other options mentioned here, there is FormatJS and it is based on ICU Message syntax and Unicode CLDR. Since Svelte is a compiler™ it probably can compile output of intl-messageformat-parser straight to functions. FormatJS is also used in their own react-intl module.

pngwn commented 5 years ago

Closing some other i18n discussions from the past but linking them here for reference: sveltejs/sapper#78, sveltejs/sapper#230.

khrome83 commented 5 years ago

@Rich-Harris - we do a lot of localization today. We handle this in our headless CMS (contentful). We use / for english and /es/ for spanish.

We do use slugs a lot, for example /blog/[slug] or /medication/[id] so I think assuming we can always localize the slug would be a bad situation. I think sapper being optioned to use the FS would not be great.

We have bee using the /[lang]/ approach in our POC of Sapper. We want to move from Next.js/React to Sapper/Svelte, since our user base is 95% mobile, and in parts of the world with 14-20s latency (all US).

We are planning to use sapper export. We are currently attempting to see if we can use a --rewrite-path rule to move files at export while persisting functionality. But it seems we bake some paths into JS for requests that cause this idea to fail.

I agree it would be beneficial for a sapper to be opinionated for an official local file system version, but that is not maintainable at an enterprise without custom scripts. For example, I would have to write a locale file during the build with all the mapping. When my prefetch method already knows the local and the path to the content, it seems to be in the wrong direction.

evgenyfadeev commented 5 years ago

Please note that using the /lang/ code in the url requires i18n but not the vice versa. The /lang/ code is needed for the multilingual sites, while i18n is necessary in any single-language site that was originally created with a different language in mind.

khrome83 commented 5 years ago

Our current site uses english at / and Spanish at /es, which is why this solution does not work in all cases either.

rchrdnsh commented 5 years ago

+1,000,000 for YAML for language content XD

stancl commented 5 years ago

I think Laravel solves the pluralization problem very nicely: https://laravel.com/docs/6.x/localization#pluralization

You can use a pipe for singular/plural:

'apples' => 'There is one apple|There are many apples',

And you can also specify an unlimited amount of variants based on the number:

'apples' => '{0} There are none|[1,19] There are some|[20,*] There are many',
khrome83 commented 5 years ago

I can’t think of any project outside of side projects or really small companies where it was ok to specify language in a file. It does not scale and does not give authors that need it access.

It would be better if i18n was approached by the user themselves or though separate plugins allowing different methods. Pulling in a YAML file or a rest call to contentful is unique to the experience for that project.

What we have to solve is not how to store strings, but how to handle routing that gives you the locale to pull.

Right now we have to do /en/some-content and /es/some-content. This is built in using folder params.

But by default and for SEO, English being the primary language of my site, en/ should be top level / and It's not ideal to be en/.

I am only making this claim for sites that have a dedicated primary language. Like a US only website that has secondary languages for that region.

I would expect to see /some-content and /es/some-content in this situation.

While I can technically do this today as well, it causes an incredible amount of duplicate pages.

Ideally the concept of having a optional primary language, and secondary languages makes sense. And optionally the primary language would be the default, and the top level.

For sites that have no primary language, the current pattern works, where locale is folder.

One idea would be to have a optional parameter.

/[?locale=en]/some-content

This would then generate content at the locale path, but also allow /some-content to be treated as a top level path. Additionally you could either provide a default value the attribute or default in JavaScript when it comes back undefined.


Zane Milakovic On Oct 3, 2019, 12:27 PM -0500, Samuel Štancl notifications@github.com, wrote:

I think Laravel solves the pluralization problem very nicely: https://laravel.com/docs/6.x/localization#pluralization You can use a pipe for singular/plural: 'apples' => 'There is one apple|There are many apples', And you can also specify an unlimited amount of variants based on the number: 'apples' => '{0} There are none|[1,19] There are some|[20,*] There are many', — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

joakim commented 5 years ago

@khrome83 Your use case is a multilingual website with content stored in a database, which is a good solution for that. For apps however it's perfectly normal to have translations of UI strings stored in files. Or a combination of both.

It's worth remembering that i18n covers a lot. It's not just about replacing UI strings or serving multilingual content. There's routing based on the visitor's preferred locale, graceful switching between locales, funky grammar to support (see Fluent), special number and date formatting, LTR/RTL.. Most websites may want the locale to be defined in the URL, while apps may want to keep it hidden in its internal state and instead allow hot reloading of locales. And I thought cache invalidation and naming things were hard :)

I don't think Sapper should try to tackle all of i18n. Most of this has already been solved by other great projects, and some will have to be custom coded per project. But some of this has to be provided by Sapper, the question is how much.

I agree with those who think Sapper shouldn't limit localization to one particular library, requirements will always vary from project to project. Instead, it could provide the infrastructure/plumbing for integrating with existing i18n projects, and let the community provide integrations to suit different needs. LinguiJS, fluent-compiler and other localization libraries that do compilation seem like a perfect match for Sapper though, so maybe they could be favoured.

dishuostec commented 5 years ago

I am more interesting in how to write code.

First, we should easily get all available locales from Sapper:

<script>
const { locales } from '@sapper/app';

function change_locale(locale) {
    ...
}
<script>

<ul>
    {#each locales as locale}
        <li on:click={change_locale(locale)}>{locale}</li>
    {/each}
</ul>

Second, think about that Sapper is a compiler, It gives us more imagination.

In traditional projects, we wrote code like this:

t('good.day');
t('welcome.user', username, param);
t('welcome.user2', {username: realname, param});

In Sapper, we may be could write more natural:

<p>{#i18n good.day }</p>
<p>{#i18n welcome.user | username, param }</p>
<p>{#i18n welcome.user2 | {username: realname, param} }</p>

Because Sapper is a compiler, It knows {#i18n ... } in html is a translation expression.

Even we could write default locale language:

<p>{#i18n 'Good day!' }</p>
<p>{#i18n 'Welcome $1!' | username, param }</p>
<p>{#i18n 'Welcome ${user}!' | {username: realname, param} }</p>

Sappre will compile these to locale file with metadata (where it used, origin sentence) for example like:

{
    "good_day": {
        "msg": "Good day!",
        "org": "Good day!",
        "ref": [
            {"file": "path/to/file/used/this", "line": 1}
        ]
    },
    "welcome_$1": {
        "msg": "Welcome $1!",
        "org": "Welcome $1!",
        "ref": [
            {"file": "path/to/file/used/this", "line": 2}
        ]
    },
    "welcome_$user": {
        "msg": "Welcome ${user}!",
        "org": "Welcome ${user}!",
        "ref": [
            {"file": "path/to/file/used/this", "line": 3}
        ]
    }
}

We may also don't care about the keys:

{
    "hashedA": {
        "msg": "Good day!",
        "org": "Good day!",
        "ref": [
            {"file": "path/to/file/used/this", "line": 1}
        ]
    },
    "hashedB": {
        "msg": "Welcome $1!",
        "org": "Welcome $1!",
        "ref": [
            {"file": "path/to/file/used/this", "line": 2}
        ]
    },
    "hashdC": {
        "msg": "Welcome ${user}!",
        "org": "Welcome ${user}!",
        "ref": [
            {"file": "path/to/file/used/this", "line": 3}
        ]
    }
}
dogada commented 5 years ago

@dishuostec It's unclear why you need hashed keys if original strings can be used as keys. Aslo I strongly suggest to avoid inventing new formats for message translations. There are standard PO files that a lot of editors support. I'm sure there are should be standard JSON formats as well and editors/services that support them.

dishuostec commented 5 years ago

@dogada

Aslo I strongly suggest to avoid inventing new formats for message translations. There are standard PO files that a lot of editors support. I'm sure there are should be standard JSON formats as well and editors/services that support them.

I use json for example because of we are familiar with it. I agree with you at this point, we should use widely used format like PO.

It's unclear why you need hashed keys if original strings can be used as keys.

As mentioned above, we keep original string in locale file, so we don't care about what the key is. It can be abstract to a result of key_hash_function, and may be the function is (string)=>string.

jonatansberg commented 5 years ago

It's unclear why you need hashed keys if original strings can be used as keys.

I think the strongest argument for adding a hash (usually of component/path + string) is to prevent key collisions. Otherwise there is no way to differentiate generic strings that might need to be translated differently depending on the context.

joakim commented 5 years ago

ocombe, a member of the Angular team who works on i18n, gave this advice earlier:

Never use the sentences as keys because you'll run into problems with your json and some special characters, you'll get very long keys which will increase the size of the json files and make them hard to read, and you'll get duplicates (the same text with different meanings depending on the context)

I can also add that if you have a typo in your original string and later fix that, all translations will have to be "rekeyed". So I'm very much in favor of using keys like hello-user.

That said, I think this is one example of where Sapper could be agnostic and leave implementation details up to the integration, effectively supporting both.

rodoch commented 5 years ago

Really interesting thread. It's full of great suggestions and I imagine the problem is going to be narrowing down all the possible solutions to something general that will work for most people and is extensible.

In that vein, it might be worthwhile to look at what other compiled frameworks/languages, outside of JS, do to address these issues? I can speak a bit about the language which I'm most familiar with, which is C# and the .NET Core framework.

.NET Core's out-of-the-box solution for string localization could be considered basic compared to some of the solutions suggested above:

As I said, this is basic. I'm not proposing it as the approach to take in Sapper. For example, I would much rather stick with a recognised translation file format such as those suggested above (and in JSON at that). But I think there is something to be learned here too: if you intend to please most of the people most of the time you can't be too opinionated and that makes it hard to implement too many advanced features. Remember, .NET is one of the most-widely used backend frameworks on the web and, as much as I complain about it sometimes, it clearly solves a lot of people's problems.

More info: https://docs.microsoft.com/en-us/aspnet/core/fundamentals/localization?view=aspnetcore-3.0

The other points to take away, though, are extensibility and taking advantage of compile-time optimizations.

The nice thing about the way localization is implemented in .NET Core is that it's very easy to extend or replace. There are libraries that allow you to use JSON or a database to store translations instead of .resx files. A few members of the Stack Overflow team have written up great blog posts where they detail a little about how they re-implemented string localization on SO:

https://m0sa.net/posts/2018-11-runtime-moonspeak/ https://nickcraver.com/blog/2016/05/03/stack-overflow-how-we-do-deployment-2016-edition/#step-3-finding-moonspeak-translation

Which leads on to the last point, being that one of the great advantages of Svelte/Sapper's approach is the kind of additional compile-time tasks and optimizations that could be achieved. For example: extracting a full list of keys without corresponding translations in the app. There might be potential here in the future for Sapper to do some cool things along these lines.

Anyway, I'm not pushing any particular agenda or approach here, just emphasising that there might be good lessons to be learned from solutions in other languages that have a compile step.

jonatansberg commented 5 years ago

After taking a stab at building a first version of something along these lines using LinguiJS (available here) I came across this library: https://github.com/kaisermann/svelte-i18n

It seems to me that svelte-i18n does most of the things we discuss here, using an established format (ICU Message Format) on top of Svelte primitives.

There are still other issues to resolve around routing, tooling, etc, but svelte-i18n looks like a really promising start!

laurentpayot commented 5 years ago

@jonatansberg there was an issue about svelte-i18n not working with Sapper. Is it fixed now?

jonatansberg commented 5 years ago

@laurentpayot I'm not sure. I think either way there might be additional changes needed in order for svelte-i18n to be usable in Sapper out of the box. The biggest concern for me is the use of module scope/a singelton architecture, as that will cause problems when server side rendering as soon as you introduce anything thats async.

But maybe that isn't an issue with Sapper in the same way as it is with say React, provided that the actual rendering is synchronous? I'm relatively new to Svelte and Sapper, so I don't know yet :)

kaisermann commented 5 years ago

A little late to the party, but since you guys mentioned svelte-i18n, I think I should give some updates about it. I first created that lib as a POC for my previous job and kinda abandoned the project for a while after that. I'm currently working on a v2.0.0 which add some new features and behaviours:

This is currently a WIP and I'm definitely taking in consideration a lot of what's said here. In no way I think I can handle every use case with just svelte-i18n. I've also thought about a preprocessor to remove verbosity of some cases, but I'm reluctant about that for now.

About creating a format specific for sapper/svelte: I'm not completely against it, but I think not using an established format is kind of reinventing the wheel. We already have great formats like ICU or Fluent, which already contemplate a bunch of quirks that a language can have.

Edit:

Ended up deciding to have a queue of loader methods for each locale:

register(locale, loader): adds a loader method to the locale queue; waitLocale(): executes all loaders and merges the result with the current locale dictionary;

image

While not extremely ideal, the "verbosity" of this approach can be also reduced in the user-land by a preprocessor that adds those register and waitLocale calls, maybe even the format/_ method import.

Edit 2:

Just released v2.0.0 🎉 Here's a very crude sapper example: https://svelte-i18n.netlify.com/. You can check the network tab of your devtools too see how and when a locale messages are loaded. Hope it helps 😁

sudomaxime commented 4 years ago

(A little late to the party too, but it's been a concern for a project of mine lately, so I came across this)

I'm coming from a region that speaks both french and english all the time, all the projects that I do requires some form of localisation. Through 10 years of moving from framework to framework, from cms to whatever ... There's always been the same things that annoyed me, there was never a perfect solution:

I think the svelte philosophy is to not bring stuff your don't need, type less and do more.

There is something about the $_("localeName") syntax that always annoyed me, why not just use template litterals. ?

Wouldn't that be cool ?

Wouldn't it be nice to simply do $`This localised content would be {numTimes} better.` Javascript already gives us the possibility to parse litterals and do what we want with it. Why create a big function wrapper for something that is already in the language and that we don't need to import on top of each template files ?

$ Could be a subscribable default sapper store that contains methods that has preloaded locales from the hypotetical locales folder.

Or even better, with some svelte magic we could use just that $`this is the {jsFrameworkName} way` and make it global, so we don't have to go through the hassle of importing a store or a lib each time.

The current locale could be available through the prefetch function as well as a param:

async function prefetch (page) {
    const { locale, translations } = page;
}

Hence we are in the comfort of the svelte interpreter, we could easily extract the template litterals and automatically add the locale in a json file inside the locales folder for the locale currently in use in the html document head.

I also like the pipe operator in use in svelte, like on:click|preventDefault. Or the godsent class:active idea ? Familiar / frequent patterns in svelte always have a solution and that's awesome

Can we recycle that for translations ?

In templates

$`There are {numBoats} boat|plural:numBoats="s" in the sea`

Generated locale.json have a format that is sensibly exportable in csv format to be sent to translators easily, it would generate a singular and plural column:

"There are {numBoats} boat in the sea": {
    plural: "There are {numBoats} boats in the sea",
    singular: "There are {numBoats} boat in the sea"
}

You can also solve very complex french plural oddities with this:

$`{numBoats} bateau|plural:numBoats="x" vous attaqu|singular:numBoats="e"|plural:numBoats="ent"`

Template literals can return this so this is also a possibility

$`There are {numBoats} boat in the sea`.plural(numBoats)

That way you don't have to do weird syntax to handle every possible use case, you don't have to subscribe to a large-esque library that tries to convert every possible pronoun...

What do you guys think ? Worth exploring ?

ocombe commented 4 years ago

As it turns out, the Angular team decided to use template string literals for its new i18n package (@angular/localize) that was just released with the v9 of Angular 😊

cedeber commented 4 years ago

Hello,

To be honest, I didn't read all messages, but I found that Mozilla is currently working on a very interesting project for localization: https://projectfluent.org/ For what I read so far it looks very good. (And I also really think that JSON for localization is shit)

Looks like the syntax files (or the <ftl lang="fr"> tag) could be compiled ahead of time and dynamically loaded as needed. Kept in sync with <html lang="fr"> and replaced/parsed thanks to fluent-dom attributes.

cedeber commented 4 years ago

Oh, I missed this one: https://github.com/projectfluent/fluent-web

andykillen commented 4 years ago

@Rich-Harris I'm really interested in helping to get this working. Be that on the more simple latin based text, or the way more complex thing like Arabic.

If there is anything I can do with testing against the problems I know of localizing into 7 languages with a global audience (including RLT, double byte characters and other such annoyances, SEO urls), attempting to bash some code to help with this, or documenting how to do it including gotcha's that often hit the inexperienced. Please let me know.

tidiview commented 4 years ago

Hey there ☆ I am new, excited, have a proposal towards what's ahead ↑. (especially you @andykillen ) I'll really something like, think it should be good:

A- ROUTING -> wikipedia level

① real arbitrary routing system like prefix, like suffix

(PHILOSOPHICAL reason: UNIVERSALITY = EQUALITY between locales huh!?) (equality in dignity) (actually locale not proper term, locale towards what kind of truth!?):

en.my-app/blog
ja.my-app/ブログ
fr.my-app/blog

Files and directories with a leading underscore do not create routes. This allows you to colocate helper modules and components with the routes that depend on them — for example you could have a file called src/routes/_helpers/datetime.js and it would not create a /_helpers/datetime route

(ok no need to tell, I already know it's just suffixes, but I really want prefixes ok? (same thing for the structure of its json: its keys are too complexes, see above ↑ same statement) (ok no need to tell, I already know it makes slugs but I really want original expressions not simili ok!? not less than beautiful in WIKIPEDIA https://ja.wikipedia.org/wiki/トマト For all that, OUR example SHOULD be WIKIPEDIA (https://www.wikidata.org/wiki/Q177837#sitelinks-wikipedia) (see correspondance between pages) technically for non latin languages like ja or ar

import re
import urllib.request
url = "https://ja.wikipedia.org/wiki/" + right_part
html = urllib.request.urlopen(url).read()
html = html.decode('utf-8')
import urllib.parse
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

url = "http://ja.wikipedia.org/wiki/トマト"
regex = r'[^\x00-\x7F]'
matchedList = re.findall(regex, url)
for m in matchedList:
    url = url.replace(m, urllib.parse.quote_plus(m, encoding = "utf-8"))

(hey @andykillen double byte characters!) (hey @Rich-Harris wouldn't that be cool huh!?)

④ passing context to page to create SEO metadata hreflang then so on...

according to plugin-intl:

  • locale: The language tag indentifying the language of the page.
  • canonical: The canonical link to the page, if the current one has not the canonical path itself, null otherwise. This is usefull to indicate the search engines which link should be registered for the index pages.
  • slug: This is the relative path of your page without any indication of the language. It should be written in the default language so that you can translate it (feature not implemented yet).
  • pathRegex: A regular expression containing your the slug for you to filter easily in GraphQL.

⑤ having a simple fallback method

1- an isPublished boolean per locale ressource 2- redirecting fallback in case locale do not exist (to a default language) (that would be the only role of the default language here ok?!)

B- TEMPLATING -> keep things simple, localization should not be a blackBox Headache

① keeping a common folder/page/whatever structure,

keeping things simple, with other files

② using serializer function,

serving strings, blocks, documents, or svgs according to json:

{
  "en": "I love tomatos",
  "ja": "私はトマトが好きです",
  "fr": "j'aime les tomates"
}

or

{
"en": [{
  "_type": "block",
  "_key": "da5f884c9804",
  "style": "normal",
  "children": [{
      "_type": "span",
      "_key": "da5f884c98040",
      "text": "Say hi to ",
      "marks": []
    },
    {
      "_type": "span",
      "_key": "da5f884c98041",
      "text": "Portable Text",
      "marks": [
        "strong",
        "<markDefId>"
      ]
    },
    {
      "_type": "span",
      "_key": "da5f884c98042",
      "text": ".",
      "marks": []
    }
  ],
  "markDefs": [{
    "_type": "link",
    "_key": "<markDefId>",
    "href": "https://www.portabletext.org"
  }]
}],
"ja": ...

this function is transforming a localeObject (eq: type localeString, localeBlock, localeWhatever ...)

function localize(value, languages) {
  if (Array.isArray(value)) {
    return value.map(v => localize(v, languages))
  } else if (typeof value == 'object') {
    if (/^locale[A-Z]/.test(value._type)) {
      const language = languages.find(lang => value[lang])
      return value[language]
    }

    return Object.keys(value).reduce((result, key) => {
      result[key] = localize(value[key], languages)
      return result
    }, {})
  }
  return value
}

(I'm leaving the plurals plus so on questions opened but it's possible to deal with that on the fly) (same thing for rtl).

This proposal is in the way of sapper: modern, yet simple, sophisticated, terribly efficient! It is not that complicated either to leave other possibilities to other desires, but all dimensions are there.

I would like someone to help me do that as my technical level is probably limited (ok let's hard code this here. This is intending to be a smart call to you @andykillen as you are claiming to be interested what a pity you are left so far) (I am intending to answer you please have any anger at me).

I think I am giving lots of hints ok? All the different pieces of the puzzle are there aren't they!? Where, who, how, why will be the hands to do what has to be done?! There are plenty of details that I am not aware of, plenty of little technical differences that count.

I would be very interested to see reactions ... HEY don't leave me alone as for @andykillen huh!?! ... that is ugly ...

Thank you for giving some of your precious time to read this too long message up until the very end: my expression is a little too much, we're in the wild here, it's time to engage, to get things done ok!?

andykillen commented 4 years ago

@tidiview interesting, some initial thoughts.

OK, if you only have 1 route, then it might work as routes/[lang]/[group]/[category]/[slug] to give /en/products/cars/ford-fiesta and /nl/producten/autos/ford-fiesta

but then things like /en/about /nl/contact and so on would become way more complex, if not impossible.

there would also need to be some intelligence at the routes/index.svelte to do dynamic routing to the language directories.

So, I'm thinking that there needs to be global translation files that do most screen text, and then additional translation strings available, all in inside the scr direcotry, i.e. src/i18n/

tidiview commented 4 years ago

@andykillen it's funny that you respond now as I just started to implement what I wrote 10 min ago!!! ☆ NICE ♫ Thank for showing interest to this proposal. Please find some remarks below:

routing is completely ARBITRARY,

but, like a map for a city, each address has a common reference to its original template thru metadata hreflang OBJECT:

(metadata include a language flag) (url prefix also is a language flag).

For the localized name of folders that would be a problem, you're likely to introduce a special logic, based on _helpers: that what I'd like to try. I would repeat that logic in subdirectories. Like you, I think it is complicated (and being new to SAPPER therefore walking on glass) that is my challenge from now on. You also have this:

Regexes in routes

You can use a subset of regular expressions to qualify route parameters, by placing them in parentheses after the parameter name.

For example, src/routes/items/[id([0-9]+)].svelte would only match numeric IDs — /items/123 would match and make the value 123 available in page.params.id, but /items/xyz would not match.

Because of technical limitations, the following characters cannot be used: /, \, ?, :, ( and ).

The reasons to try are:

content and rendering are kept well SEPARATED

What can you ask for more?

(it is true that this is made a little more complicated with SAPPER than with just SVELTE as SVELTE is not as opiniated on routing as SAPPER) (I think though that if you take time to understand the routing logic of SAPPER, there is no reason that one should not be able to deal with it)

rodoch commented 4 years ago

I find this thread endlessly interesting but it also feels like it's going in circles a little bit. I'd like to suggest that maybe Sapper shouldn't/needn't be opinionated in these matters. Perhaps it's something best left to the ecosystem for the most part.

I've been really putting @kaisermann's svelte-i18n through its paces on a number of production projects and honestly it's wonderful. It's at v3, it's stable, lightweight yet comprehensive, very extensible and there still seems to be room for plenty of compile-time enhancements. Sapper doesn't necessarily have to bless this or any other project, but a similar situation could exist as is the case with Svelte and the number of routing solutions that are available. I'd hate to see this issue holding Sapper back on its journey to v1 and likewise it would be good to see some of the energy in this issue directed towards one of the existing solutions.

That said, resolving sveltejs/sapper#1036 (at least for lang and dir attributes on the html element) is crucial to facilitating proper i18n in a framework that supports SSR. I also think optional parameters à la sveltejs/sapper#765 would be a major boon in terms of internationalisation and this is something that would be best handled within the core Sapper project. But neither of these things are necessarily specific to i18n.

Cochonours commented 4 years ago

I wanted to try my hand at a simple sapper project, and here I am not being able to work out how to set the lang attribute of html in the template.html. Was there no progress on i18n since last year?

victorbjorklund commented 4 years ago

I don't have any solutions but I just wanna add that I think i18n needs to be handled by sapper (as far as I understand it) because localized url:s (I know that there seems to be some workaround to "hack" the url:s but that's not very developer friendly and might not be very scalable). I would love to be able to recommend clients (at least those willing to live a bit more risky using an early framework) to go with sapper (because lets face it it is awesome) but most of our clients need localized websites and some wants the URL:s localized. So for now it is hard to recommend sapper for a project. I might be wrong and it could be possible for a package like svelte-i18n to also manipulate the routing.

Cochonours commented 4 years ago

Yes, localized urls is a must. I have started exploring defining my paths as maps with a path for each language, so I can easily switch from any language to any other. There are many subtleties involved so having something working by default would really put svelte above most others!

jonatansberg commented 4 years ago

Strong agree on the need for localized urls. We have been trying several different ways of getting this to work, including mounting Sapper on different routes in server.js, scoping the routes with a language "folder", etc. None of them work completely.

@Jayphen do you have some more insight in to this after experimenting with different routing hacks for a few different projects?

Jayphen commented 4 years ago

I’ve only had to deal with the routing aspect of this problem, as in our case all translations are managed by an external CMS.

Mounting the app at different basenames (to which there is a very brief reference to in the docs) works if you have a small amount of languages to support, and the routes aren’t localised. I had some trouble getting it to work on Vercel, but the author of the vercel-sapper package showed how it can be done.

The app I’m working on does not use the above method though; instead it uses a base [lang] directory in routes, as well as setting the lang in the sapper session via middleware from acceptsLanguage. This also seems to work okay, and I’m unsure which is a better approach.

Both of these approaches only work because routes themselves are not localised, and are prefixed. In another project with localised routes we are fetching them from a CMS all ahead of time and creating an in memory manifest. A custom Link component can be used to generate internal links from the manifest. This basically sidesteps the file system routing, and is not trivial. That project is still in its infancy, so i don’t have much more to say on it yet other than the solutions Rich already suggested.

At the very least for now it would be great to have configurable template replacements as options in the sapper middleware, so we can inject a Lang attribute on the HTML element. I think there’s an open PR for this.

Cochonours commented 4 years ago

I'm completely new to sapper/svelte, but have been disappointed with i18n in many frameworks so if you are willing to fix that better in sapper that would be great!

Do you have a list of features to have already, or is it still being discussed?

First there should be a configurable way to set the locale from different sources. The most logical IMHO for a default would be : 1/ when the URL is localised, use the URL to define the locale (this should take precedence as all translations are not created equal. If the user doesn't know that locale, or has set a different preference in a cookie, then it's better to display a message telling him that page is available in his favorite locale rather than displaying the other page directly). 2/ or, if a cookie is set with the locale information, use that 3/ or, use the accept-language header. 3/ if nothing matches, use the default locale for the route (or global default locale).

I like how svelte handles slugs, and I think something similar should be done for i18n. So, for a given route, we still have the idea of folders and ONE svelte file for all translations, with a special syntax inside the svelte file to specify all the possible values of the URLs with the corresponding locales.

So given this directory structure :

-- |private| ---> |account| ------> settings.svelte

we know that /private/account/settings should hit that svelte file, with the default locale (set site-wide, or sufixed in the file name of the svelte file). Here 'en' but could be any locale.

Inside the svelte file, we define define each URI as an array of (nb folders inside "|" + 1 for the svelte file). E.g. : l10n = [ { 'fr', ['privé', 'compte', 'paramètres'] }, { 'kr', ['사유', '계정', '설정'] }, ]

If the URI is /사유/계정/설정 we know the language should be korean, and we could get the localised names from ${i10n.private}, ${i10n.account} and ${i10n.settings} for example.

Then sapper can generate the <link rel="alternate" hreflang=.../>` in the header as well as update the sitemap with this info.

And set the html lang according to the locale, and maybe provide a default way to display the different translations with links as almost every page should display that one way or the other.

sudomaxime commented 4 years ago

If someone is stumbling on this looking for a working and tested solution, here's my code.

Basically what it does is look in your URL for a pattern like /en/watever or just /en it extract the locale slug from that url and matches it against your provided list of locales. If it doesn, it will simply hit the 404 middleware of sapper.

With this code you do not have to use a weird folder structure to make your app work.

To get your locale in your templates, simply get it out of the session param in your preload function.

Obviously you will also have to change your links on the front-end to prefix your links with the current locale, this is up to you.

Simply add this code in your server.js file.

const defaultLocale = "fr";
const locales = ["fr", "en"]

/**
 * Safely extracts a locale out of the url.
 * 
 * @param {string} route - An url path
 */
function localRouteRegexp (route) {
    let localeString = locales.join("|");
    let regexp = new RegExp(`\/(${localeString})(\/|$)`, "gm");
    let currLocale = route.match(regexp);
    if (!currLocale) {
        return defaultLocale;
    }
    currLocale = currLocale[0].replace(/\//g, "");
    return currLocale || defaultLocale;
}

/**
 * Creates the express valid path regex
 * to allow matching the app on different
 * routes.
 * 
 * @param {Array<String>} locales - A list of supported locales
 */
function expressLocaleRouteRegex (locales) {
    let regexp = "(";
    locales.forEach((locale, i) => {
        regexp += `/${locale}`;
        if (i !== locales.length -1) {
            regexp += "|"
        }
    })

    regexp += ")?";
    return regexp;
}

/**
 * A middleware to add the current
 * locale to the svelte session store.
 */
const bindSessionToRequest = (req, res, next) => sapper.middleware({
    session: () => ({locale: req.locale})
})(req, res, next)

/**
 * Finds the current locale in
 * the url path and sets it to the
 * request object.
 */
service.use((req, _, next) => {
    let locale = localRouteRegexp(req.url);
    req.locale = locale;
    next();
})

service.use(
    expressLocaleRouteRegex(locales),
    compression({ threshold: 0 }),
    sirv('static', { dev }),
    bindSessionToRequest
)
floratmin commented 4 years ago

I think the only really important thing would be to have an elegant way to implement localized routes. For SPA's with logged in users a solution with cookies is sufficient. Best would be if you could mix both approaches. The rest can be left to plugins. For localized urls my favorite would be:

domain.tld/page-in-english.html
domain.tld/fr-FR/page-en-francais.html
domain.tld/de/seite-auf-deutsch.html
…
Catsvilles commented 3 years ago

@sudomaxime Hey, thanks for your solution but is it suppose to work only with Express.js? Because I'm trying the same code with polka and it always returns 404 for every page. And did you test and noticed anything about the performance when using this code?

5argon commented 3 years ago

It is not mentioned by anyone here yet, Next.JS has i18n routing built-in now https://nextjs.org/docs/advanced-features/i18n-routing

Also, anyone mentioned the <link> tag in <head> yet? It would be great if each page has a rel="alternate" to all other versions automatically (all pages and all languages has bi-directional link to each other). I had to do something like this in my own site :

<svelte:head>
    <link rel="alternate" hreflang="x-default" href="REAL_HOST/{neutralPath}" />
    {#each supportedLanguages as language}
        <link rel="alternate" hreflang="{language}" href="REAL_HOST/{language}/{neutralPath}" />
    {/each}
</svelte:head>

(The "alternate" includes the language you are on too, but it seems that is OK. More info.)

andykillen commented 3 years ago

@5argon I heard said from a couple of international SEO specialists that putting the hreflang stuff into a sitemap.xml is better than putting it in HEAD meta. Thought you might be interested.

Jayphen commented 3 years ago

It is not mentioned by anyone here yet, Next.JS has i18n routing built-in now https://nextjs.org/docs/advanced-features/i18n-routing

I haven't used nextjs's i18n routing, but everything described in the docs can be achieved with Sapper today in userland, except the automatic lang attribute on the html element (for which there is an unmerged PR here https://github.com/sveltejs/sapper/pull/1695)

xpuu commented 3 years ago

I spent last 23y with webdevelopment. Last week a childhood friend asked me to make him a simple static website. Piece of cake I thought. Mankind is getting ready to colonise Mars. We have tools to make simple websites.

I chose Vercel as my target platform. I wasn't sure about Sapper deployment so I tried Nuxt first. Using Nuxt-i18n was a bliss. It all went great until I peeked in exported source code. The amount of bloat overwhelmed me.

I switched to Sapper, but now I realise it's impossible to have sanely localised URLs. I already spent few days solving it. I tried:

Lessons learned from this

Catsvilles commented 3 years ago

^^^ Backing up the comment above, really hope that SvelteKit will allow having i18n URLs structured like we need and want, now I stopped developing my app just waiting for this.