Add support for translations

tronical commented 4 years ago

This ticket tracks the ability to translate a Slint user interface in a way that allows annotating translatable strings,

having the infrastructure and documentation for extracting said strings from .slint
using a third-party tool to translate them and feed the translations back into the Slint build process.
Finally the translation that's chosen at compile time should be embedded in the resulting compiled code and be visible at run-time.
This may include support for multiple languages and choosing between them at runtime (Language can be changed at runtime)

Proposal.

Use a @tr(...) macro with the same syntax as the rust tr! macro from the tr crate. (cf https://github.com/woboq/tr/issues/1 )
update the xtr tool on crate.io to also support .slint file
Behind the scene, use gettext

Workaround until this is implemented.

Put all the string in a global object and do the translation in native code See https://github.com/slint-ui/slint/issues/33#issuecomment-1275856180

Plecra commented 4 years ago

Idea: Don't handle text at all in SixtyFPS, and only use symbols in .60 files, which are then (somehow) given to the SixtyFPS by the user.

It gives the user more choice, but more importantly (imo) means that i18n can be handled by a library specifically designed for it.

tronical commented 4 years ago

In my experience, the process of translation usually involves some "local" tooling where the the strings to be translated are placed in some third-party format (I've seen excel spreadsheets!), sent to people typically external to the project and the result is imported with the same "local" tooling. The library part of run-time injection of these translations is fairly straight-forward.

In some scenarios it makes sense to just select one language at compile time but more commonly I think it makes sense to have "language packs" in the form of external files that can be mmap()'ed.

Do you have specific libraries in mind for translation?

I think the symbol approach is indeed something some users are preferring for UI translation. I personally prefer the literals in the source, but I think both are valid approaches.

ogoffart commented 3 years ago

For string in rust, we could use https://github.com/woboq/tr

For string in .60, i'd like to also use a gettext based approach, with a @-macro

How would that look like? For simple string that's easy: @tr("Simple string") But how do we handle substitutions, plural forms, and so on?

Idea:

Text {
    // %0 %1, ....
    text: @tr("Hello %0, my name is %1", root.your_name, root.my_name);
    // same substitution language as rust:  (this include using {0} and {1}
    text: @tr("Hello {}, my name is {}", root.your_name, root.my_name);
    // Should we or should we not allow expressions like in normal strings? 
    // maybe limit to simple property access?
    text: @tr("Hello \{root.your_name}, my name is \{root.my_name}");
}

In fact, i'm tempted to use the same syntax as the tr! macro, although even that is not set in stone: https://github.com/woboq/tr/issues/1

Should we also offer a @format() macro or a format() function to do that at runtime with potentially the same syntax?

We need an code extractor. we could modity xtr but the problem is that .60 have complex lexing rules (because of the \{ } in quotes, and i'm not sure if i should add really so much .60 related code in xtr Or we could develop an extracting tool.

jacquetc commented 2 years ago

Hello,

I couldn't help too much on the technical side, but I see a few points about localization:

On the code side, let's not forget support about developer notes and contexts, so useful for translators ;-) We all hate translating blindly.

On the translator side, which localization file format to use ? Please choose a known file format, see 45 formats in Transifex doc (the list is in the menu on the left) and take your pick. It would be nice to support metadata like developer notes and contexts. I personally think that the file format ".po" (Gettext) is common enough. And its specs are simple enough .

@tronical I found OrbTk using Ron with an exotic translation file . I wouldn't use this for reasons cited earlier.

I think that if too much liberties are given to the translation workflow, the library dedicated to translation would become difficult to maintain. Keep it simple and stupid. I'd love to see to ways:

one ":/l10n" folder, containing files with the format {app_name}_fr_FR.po or {app_name}_fr.po Then, in main.rs, select a any lang code, and it will find the corresponding file.
a way to dynamically load a .po whenever we want. Like @tronical said, to easily add "language packs".

The first would need a system like Qt's resource files to include statically images, translation files, and others in the executable. I know Qt is compiling them in a precompilation stage, so I'm not certain of how to declare/compile them in Rust. Dedicated library pointing to a resource file on which are listed resources ? Finally, Using this library to access these resources ? Lot of rambling, sorry... It's probably already implemented or on the way. Having the C++/Qt mindset, I'm still struggling (a lot) with Rust :-)

Evrey commented 2 years ago

I’d suggest using the Qt L10N file format »TS«, which means we pretty much get a decent and OSS GUI L10N tool for free.

Also +1 for optional asset bundling and having a proper resource management system. Having a resource management system also allows for hot reloading of assets. This may even be attractive for L10N work if we used plain-text L10N file formats like Fluent’s .ftl. Imagine having the app open, editing the my_app.de.ftl file, and seeing the changes live in the running app.

tronical commented 2 years ago

Live reload/preview is a neat idea, in the spirit of the live preview when editing the markup files :)

Evrey commented 2 years ago

In that case i’d say all in on Project Fluent and .ftl files.

jacquetc commented 2 years ago

Hello,

I think dynamic load/unload/reload of assets is interesting for i10n files. It's a really different can of worms when I compare with static bundling of assets. Yet, both ways can coexist. There are already libs for static assets.

Maybe we can implement step by step:

decide an official/supported way for static bundling using a chosen existing 3rd party lib
implement whatever i10n file and tr mechanism you choose.
implement dynamic loading of assets

This way, there is something quick and not so dirty to begin quickly with the translation implementation. Then, an evolution with the dynamic loading of assets. Dynamic refresh can be a feature of dynamic loading, not specifically of the translation system.

If not existing, I suggest creating an issue specifically for static assets and dynamic assets.

Cheers

ogoffart commented 2 years ago

Currently, if one would want to do translation, it is possible with a global object like so:

global Strings := {
   property <string> text-foo: "This is the Text for Foo";
   property <string> button-cancel: "Cancel";
   //...
}

Then the other elements can use the Strings global object, and the native code could translate all the string.

Example of project using that: (although it is using an old version that did not have global object accessible from rust at the time) https://github.com/getsentry/hackweek-rust-gui/blob/7b8b464de0b8e76219270316ef0ffa2f5669c6ab/sentry-sixty/src/main.rs#L154-L161

zbraniecki commented 1 year ago

Hi there. I'm the maintainer and one of the authors of the Fluent system, as well as a co-creator of ICU4X, and contributor to Unicode Message Format 2.

It's a bit concerning for me how far along Slint is without any notion of I18n/L10n in place. Such approach often leads to trying to plaster i18n/l10n on top of a non-i18n system leading to suboptimal architectural choices.

I've spent a lot of time building I18n UI architectures (including multi-modal for VUI/GUI combos) and I encourage you to look at Fluent or Message Format 2 as a basic l10n system and bind it deeply into your Widget model.

You can read more about my approach in https://github.com/raphlinus/crochet/issues/7 and https://github.com/unicode-org/message-format-wg/issues/118 . We're working on MF2 now which will allow for rich markup passthroughs and generation for deep GUI fragment integration.

In any case, I advise against treating l10n as something you can just button.label = formatString() into. It's going to limit your system as a globalization UI target.

tronical commented 1 year ago

Thank you for taking the time to look at Slint and commenting here!

I'm intrigued by the concept of a "localisation unit".

In the current model for translations in Slint that we have in mind, each translation is a binding that's automatically kept up-to-date. And based on our experience with KDE and Qt, we've seen this work. But it's still relatively hard for translators to get enough context about where the strings are really used, to create the best possible translation. I'm rather intrigued by the idea of enhancing the DSL (in our case) in a way that we could perhaps extract more structural and relational information for translators to see. That would indeed require a message format beyond the dumb { source_string, source_file, line_number, column, some_random_context_string_the_developer_came_up_with }.

I reckon we might do this in stages though.

Evrey commented 1 year ago

UMF2 looks very intriguing, indeed, even better than Fluent. I also assume it’ll pair well with ICU4x, for obvious reasons, so… guess that’s the only sane target to aim for regarding I18N.

I was quite surprised seeing a v1.0 release with this here and the accessibility issue still open, given how deeply proper I18N integrates into pretty much everything.

ogoffart commented 1 year ago

Prototype in #2662

We decided to go with a gettext approach, because we feel like it is better to have the original ebglish string in the .slint file

Regarding the comments that suggest fluent, I think this is valid and it is entirely possible to use fluent with Slint, see https://github.com/slint-ui/slint/issues/33#issuecomment-1275856180

One way could just to have a generic callback that forward to fluent

export global Translator {
     callback translate(string, [{key: string, value: string}]) -> string;
}

or even some possibly auto-generated global:

export global Translations {
     callback foo-bar-greetings(user-name: string) -> string;
     // ...
}

Evrey commented 1 year ago

I don’t quite get it, to be honest.

Fluent and UMF2 have been designed to tackle ages old, well known, limitations and weaknesses of gettext-like libraries. Namely the limitations that real human text is dependent on the parameters you try to render in between. Things like auto pluralisation. And these two are existing standards with existing implementations, one directly by Unicode. You’re working on a new GUI framework that has the opportunity to »Do Things Right™« from the get go, unlike, say, Qt, which has been around since before Fluent existed.

And having mentioned Qt, it is not uncommon to pack the base translations right into the shipped executable, so there can’t be a case where default translations are missing. Slint can do the exact same, so where exactly is the value in having »original English« in the source code? If anything, that’s a downside, because in a professional setting, your editors (the people, not the software) now have to touch UI source code to fix mistakes in the default translation. And even worse, changing the default translation in source code now also means »fixing« all translation keys in all translation files. There’s a reason Qt at some point added a feature to generically look translations up by an unchanging key instead of plain English.

I’m surprised and confused.

ogoffart commented 1 year ago

@Evrey Thank you for sharing your concerns and insights!

But gettext remains a state-of-the-art solution widely adopted in the industry and is even integrated into the glibc. BTW, it does address the plurals issue for quite some time.

Regarding Slint's formatting layer, we are still in the process of finalizing it. For the MVP, we plan to use numbered placeholders like {0}, {1}, etc., allowing for later inclusion of more advanced formatting options such as named placeholders or additional formatting directives.

Slint aims to support MCU and no_std runtimes, so we will need an option to read the translations at compile time to embed them into the binary as well.

The main reason we want to use gettext is because we want the original in the .slint files because it offers convenience for UI developers who deal with numerous strings. The intention is to simplify the process by enabling developers to place strings within quotes without the need for extra message IDs or separate files.

The concerns about fixing typos invalidating the string are effectively addressed by existing gettext tooling. These tools handle scenarios where the original source changes, ensuring translations remain intact while marking the translation dirty as changes to the original source often require corresponding adjustments in the translations.

There’s a reason Qt at some point added a feature to generically look translations up by an unchanging key instead of plain English.

I'm not aware of that feature.

Anyway, i have read https://github.com/projectfluent/fluent/wiki/Fluent-vs-gettext#social-contract and I disagree with it: Strings in the source code are easier to write maintain (as they are in the right context) Remember that the person writing the .slint file is supposed to be the designer of the UI.

constituent commented 1 year ago

I personally prefer key to plain English A decent lsp plugin could help with the key vs concrete text problem, e.g. hover the mouse pointer over the key will show a popup of English and/or designer preferred language. For a designer team with members of different native language (maybe rare but who knows) this feels more friendly

I have used fluent in work, and feels the flexibility and customizability be unexpectedly necessary. Except for plural nouns, verb tense and many others are a must. I even defined several custom functions, one to join a list of numbers to string, with different delimiter in different language; one to convert discount number, e.g. 30% off will become 7折 in Chinese, so something like (100 - discount)/10

ogoffart commented 1 year ago

Released a blog post about this: https://slint-ui.com/blog/translation-infrastructure

ogoffart commented 1 year ago

Initial translation infrastructure was merged

zbraniecki commented 1 year ago

Hi @ogoffart - one correction for your blog post. Your snippet of "this is how it would look in Fluent" is architecturally wrong. As explained in the documents listed above Fluent actively discourages the Label.text = format("key") model. Instead we build bindings between UI elements and L10 unita and bind them with an attribute . It's a similar model to how you would bind a CSS class to a UI element with a "class" attribute. See "l10n-id" attribute in Fluent DOM.

ogoffart commented 1 year ago

@zbraniecki I'm not sure how that would work with Slint though, there is no DOM with Slint, so using fluent dom is out of question. Where do they discourages this usages? And if the snippet i used in the blog is wrong, what could you imagine a write syntax be?

zbraniecki commented 1 year ago

I just meant that in the example you should have Label.l10nId and assign a translation unit to a UI element, not a translation string to a single attribute. (in fluent a single message may have a value and multiple attributes - a binding of a single message with multiple attributes to a single element with multiple translatable attributes )

ArcticLampyrid commented 1 month ago

Remember that the person writing the .slint file is supposed to be the designer of the UI.

But they are not translators! I can't imagine having a UI designer complete all the i18n work.

Strings in the source code are easier to write maintain (as they are in the right context)

I disagree with this. Multiple scenarios where the same vocabulary is used in one language may require the use of multiple words in another language. In this case, retaining the original text will lead to confusion. In addition, some brief original texts often lack the necessary context. When we use id, we can name it like dialog-logout-ok and dialog-remove-ok, but with the original text they will all be named Ok.

ogoffart commented 1 month ago

But they are not translators! I can't imagine having a UI designer complete all the i18n work.

Indeed, never said they are. They need to put the right context and have decent string, but that is indeed hard.

Multiple scenarios where the same vocabulary is used in one language may require the use of multiple words in another language. In this case, retaining the original text will lead to confusion. In addition, some brief original texts often lack the necessary context. When we use id, we can name it like dialog-logout-ok and dialog-remove-ok, but with the original text they will all be named Ok.

This is taken care of by the msgctx of gettext, which by default with Slint is the component name, but can also be customized with =>

So the translators don't just see "Ok", they see ("LogoutDialog", "Ok") and ("RemoveDialog", "Ok")

slint-ui / slint

Add support for translations #33

Proposal.

Workaround until this is implemented.