Open wmertens opened 1 year ago
Probably instead of putting the json files in /public
they can go in the build as well. Vite can probably handle imports with interpolations.
For the pluralization of the variable strings: please look at other frameworks how they implement this: Lavarel comes to my mind now: https://laravel.com/docs/10.x/localization#pluralization
It allows rules for choosing the right string variant, depending on the number: for example
'Apples' => '{0} There is none|{1} There is one|[2,*] There is :count',
or
'apples' => '{0} There are none|[1,19] There are some|[20,*] There are many',
Not all languages have the simple rules like zero and many.
@cwerner1 good point, although I prefer not parsing the text too much, so I'd rather do something like {0: 'tr for 0', 1: 'tr for 1', 2: 1, 3: 1, _: 'tr for all'}
, where 2 then uses the translation for 1.
Here is a prototype of what the runtime looks like: Qwik Playground
(I needed to pin 1.1.5 because 1.2 has a broken playground, and for some reason the language change causes an import error in dev mode so it's running in prod).
Note the '...' and loading messages when changing languages, but only one time.
Also note that the initial html only contains the translation Aardvark once, in the HTML, and not in the store.
I added parametrized translations to the prototype: Qwik Playground
However, this somehow puts Aardvark into the serialized store even though it shouldn't. I'm guessing this is a fixable problem though, since in the previous prototype it doesn't include Aardvark.
what are your thoughts on attributes? will these be supported by the vite transformation?
<button aria-label={_`aardvark`} >foo</button>
@imoldfella hmmm didn't think of that one, that's indeed harder because _ normally returns a component. So either the user has to use a different call or the transform has to change depending on context.
In this case the $localize approach just works...
what do you think about vite transforming all the _`` into signals? They seem pretty light weight, although not completely free.
Hmm that could be done if _ came from a hook instead of an import 🤔
@wmertens Your suggested rules is only usable for a limited list of languages: on this page: https://lingohub.com/blog/2019/02/pluralization under "CLDR Overview" is a table with multiple languages and their rules for certain numbers/ pluralized strings. For example the Slovenian language has different rules depending on the mod of the number, or another different rule for the french language is: for the number 0 and 1 is the same translation used.
About 9 years ago I have integrated a pluralization engine on a top of a Zend Framework application and I solved the problem in this approach:a
I would recommend a simple translation function_
, which doesnt handle any pluralization and another, something like _p(string, number, ...args)
which can handle this kind of stuff. Here would also be a place to integrate the rules engine for the different languages.
@cwerner1 I don't understand, can you give an example that you can't express using something like {0: "0", 1: 0, "2": "2", "_": "*"}
?
Did you try the playground I mention here?
I'm also more and more convinced that putting the application strings in the build is a really good general approach, so then pluralization would have to be embedded as well.
PS note that the pluralization here is orthogonal to i18n itself.
@wmertens @mhevery is there something in the work or just a list now?
We implemented qwik-speak and there are several things we don't like:
The coolest thing ever would be to implement auto-extraction of all jsx texts.
It might be interesting to check apples latest changes to localization. Especially how they work with variations and how the dictionaries are build.
https://developer.apple.com/documentation/xcode/localizing-and-varying-text-with-a-string-catalog
Looking forward to hear from you guys.
We implemented qwik-speak and there are several things we don't like: But this repo is not for qwik-speak.....
Are there things that you don't like with $localize
approach?
We are implementing it again with localize and will let you know.
Qwik-native i18n
Continuing the Discord conversation here.
We want to have an automated mapping from keys to strings, depending on the user locale.
Requirements
en_us
->en
->C
Things to minimize
Things to maximize
Bonus points
Things to maybe allow
These are probably nice and it would be good if they are potentially possible without changing the API later:
${count} items
vs${count} item
Contexts
Approach
API
We'll use template strings to allow embedding parameters into the translation strings. We'll let the dev choose the prefix; for example:
For outside-of-tree use, we'll need the locale to be passed in explicitly. We can let the template function also be a function accepting the locale, then returning a template function that returns a promise:
Template strings are converted to keys for mappings. Parameters are replaced with
$#
, for example:Conceptual implementation
The
_
function will manage a singleton store of all used translations. For SSR, it will eagerly load all locales. On the client, it will only load translations when they are used.We consider the template string to be written in the
C
locale. If a translation is missing, we'll use theC
locale as the fallback.We'll use a
qwik-i18n
build step to extract all template strings from the source code and generate a JSON file with all translations, per language. This JSON file will be loaded by the_
function as needed. We'll also optimize the function calls, see below.The
_
function therefore maps fromC
to the desired locale, loading the translations as needed. Inside the tree it returns a component that uses a store to get the translations. Outside the tree it returns a promise for the translation.All files that need to be maintained are stored under
/i18n
, and the resulting data files are stored under/public/_i18n
.If a translation is missing,
_
will try to load the locale, the fallback locale, and finally theC
locale. If the translation is still missing, it will return the key.Optimizations
Since Qwik can recover text nodes for serializing stores, we must ensure that translations are added verbatim to the DOM. Furthermore, we want to ship as little data as possible to the client. We'll start each SSR with an empty store in the i18n context, and it will be populated by
_
calls. This means that at the end of SSR, the store contains only the used translations, and Qwik will reuse the text nodes. Only when parameters are used, the text nodes will differ from the store data.On the client, we'll use the store to populate the translations singleton, and load additional translations as needed.
Having a single JSON per language means that to look up one translation, all translations are loaded. We can improve this by splitting the translations into multiple files.
First, we'll map from
C
locale to an index. This index is then used as the key in the locale's JSON file, which now becomes an array. We'll split the JSON array into multiple files, each containing e.g. 15 translations. We'll use the index to determine which file to load.A bonus is that the JSON arrays don't need keys any more, saving a few bytes per translation.
During the build step, we'll maintain the
C
index mapping. Any existing mapping is retained, and new translations are added to the end of the array. This means that translations will always retain their index, even when new translations are added.Importantly, this means that indexes close in number are also close in application context, which may mean that loading an array subset to satisfy a single translation also loads other translations that are likely to be used.
Since we know the
C
index mapping during the build, we can replace the_
template calls with the index. For example,_`Hello, ${name}!`
maps"Hello, $0"
to index 4 and the call becomes_(4, name)
, saving yet more bytes. However, to allow for signal propagation, we in fact replace the call with the resulting element, namely<I18n id={4} params={[name]} />
.If the
_
function is still called as a template function, we load the mapping file and look up the index. If not, the mapping file will never be loaded. This provides a fallback during development.Mapping
C
to an index and chunking the locale arrays makes it hard to maintain the translations manuallly though, so we'll let the the translations be managed as YAML files (which are a superset of JSON). The build step will convert the YAML files to the chunked JSON arrays. The build step will also add missing keys to the YAML files.To allow for dynamic translations, the
_
function can load extra mappings in any convenient way.Grouping translations can be done in many ways, so we'll leave that for now, confident that we can add it later without issues.
To allow for varying translations based on the parameter, we can allow the translation string to be an object with the key values of
$0
being used to select the translation. For example:_`${count} items`
can map to the translation object{ "1": "1 item", "_": "$0 items" }
.Conclusion
This approach seems to tick all the boxes, with minimal data transfered during SSR.
The net result is a
/i18n
folder containingC.json
(an array of all encountered template strings) and per locale alocale.yaml
file (containing the translations). The build step will generate the chunked JSON files from the YAML files under/public/_i18n
, and the_
function will load the JSON files as needed.This requires a Vite plugin that can detect and update all the calls done with the
_
function, as well as maintaining theC.json
file, adding missing keys to thelocale.yaml
files, and generating the chunked JSON files.