Closed MartinCerny-awin closed 6 years ago
Hey @MartinCerny-awin, sounds reasonable.
What are you actually trying to achieve? Test that you app can handle special characters? Test UI with messages that might be longer in some languages? I just checked briefly wiki...
I'm happy to merge/review the PR! I remember I saw this feature in Transifex but never understood the usecase.
Mostly, I want to verify that all the strings in my app are localizable and extracted. It also helps with testing special characters and longer text, but my app is being translated into Chinese and it is actually shorter.
I would very much like this feature as well! Good one.
I have started thinking about pseudolocalization (PL) and I would like to discuss with u the options how to implement it.
1) Locale - PL locale would have to pass test for valid locale. https://github.com/lingui/js-lingui/blob/master/packages/cli/src/api/locales.js#L15-L22 This would allow to add PL locale and extract the messages for them. There are two options: a) Hardcoded locale - we can hardcode locale for PL in the code and directly check agains this hardcoded locale. I would suggest using pseudo-locale as a code. b) Configuration file - add PL into the configuration file. We would check if the locale for PL is set and validate when newly added locale or folder with such name equals to the PL option
2) Pseudocalization - the actual transformation could happen during two CLI commands: a) Compilation - This would give us option to change text that would be getting transformed in language we understand. The text could be changed by adding a translation in messages.json. If translation is not added, the default text would be transformed. The same is currently done for files with missing translations. b) Extraction - This would transform text immediately when doing extraction. The translation in messages.json would be filled with PL text. This would give us option to change translation in PL.
a) Hardcoded locale - we can hardcode locale for PL in the code and directly check agains this hardcoded locale. I would suggest using pseudo-locale as a code.
Locale must be valid BCP-47 and there must be plurals defined for it. We could use en-pseudo
or even *-pseudo
(for any language).
On the other hand, Android uses xa
for PL based on English and xb
for right-to-left PL (it basically reverses messages). What do you think about xa
and xb
? I'm afraid that we need to make an exception for plurals, which won't be defined for these locales, but we could simply load plural rules for the base language (English in default case).
b) Configuration file - add PL into the configuration file. We would check if the locale for PL is set and validate when newly added locale or folder with such name equals to the PL option
This would be great. Something like:
{
"lingui": {
"pseudolocale": "en"
}
}
Which would generate xa
locale based on English.
2) Pseudocalization - the actual transformation could happen during two CLI commands:
Hmm, I can't decide, because both options seems to be useful! What do you feel is the best option?
Actually, if we do pseudolocalization at compile time, we can simply enable it in i18n
or I18nProvider
, without any message catalogs in filesystem or additional configuration.
This would enable PL based on English:
<I18nProvider language="en" pseudoLocalization>
<App />
</I18nProvider>
This would work only in development.
What do you think?
Going further, the pseudolocalization could be done with a cloud translation API like Watson etc?
@tricoder42 I think that it is better to have PL also in production. Sometimes it could be an acceptance criteria and it gets tested in production with feature toggles.
I would not strictly set that PL must be generated for English. The best would probably be to use default language. I meant the configuration to be used what locale code would be generated for PL, for example we could specify
{
"lingui": {
"pseudolocale": "xa-PL"
}
}
This would generate PL in xa-PL.
The best would probably be to generate it during compilation. We could change default message in language we understand and it would be transformed latter.
@sedubois I do not have experience with API Watson. What would be benefit over using some Node library for example this one https://github.com/bunkat/pseudoloc ?
@MartinCerny-awin Fair enough.
Regarding locale code, what about the other way round: en-XA
would be LTR pseudolocale based on English. en-XB
would be RTL. I'm a bit worried to use PL
as a country code, but I've just checked briefly BCP-47 locales and the only conflict is pl-PL
(Poland).
Or we could simply use xa
or xb
and it would be generated either from sourceLocale
or fallbackLocale
.
We do not have to strictly specify the country code. It was just an example. It would be on user to choose language code and country code. She could specify anything as a pseudolocale option.
Make sense! Sorry for overengineering :)
Well, then it's settled:
Add pseudoLocale
to config
lingui extract
will create directory for pseudoLocale
automatically (no need to run lingui add-locale
)
lingui compile
generates pseudo-localised strings using pseudoloc. The only tricky part is wrapping HTML tags with delimiter and then striping delimiters from pseudo-localised strings, so variables and rich-text works as well.
What do you think?
Yes, that sounds good. Just to confirm the psudoLocale code could be anything even though it is not valid BCP-47, am I right?
For the delimiter, I think do not think that we have to wrap delimiters. We can specify pseudoloc startDelimiter as a <
and endDelimiter />
. This would not transform variables and HTML tags.
@MartinCerny-awin I was just thinking that by using a machine translation service, the translated text would be even closer to the actual translation. Nowadays machine translation became really good, at least that’s the case between English and French. Google for instance improved their deep learning algorithms a lot. Then, this machine-translated version could serve as a base for the translator.
It would be quite amazing if with just one ‘lingui extract’ command the whole app was entirely machine-translated.
Apps like https://github.com/OpenNewsLabs/autoEdit_2 (for video captioning) use a variety of machine translation APIs.
Anyway, please discard if it’s considered out of scope, just wanted to share what it made me think about ...
@MartinCerny-awin I think start/end delimiters should be {
and }
.
Consider this message: <em>Hello</em> <strong>{name}</strong>
, only Hello
should be pseudolocalised. I think $$<em>$$Hello$$</em>$$ $$<strong>$${name}$$</strong>$$
should be passed to pseudoloc
and then $$
should be stripped away (or any better delimiter).
Actually, thinking about it, plurals are tricky!
{value, plural, one {# book} other {# books}}
where only book
and books
are pseudolocalised, so it's not as easy as setting start/end delimiters.
@sedubois That sounds good! Let's finish this first and then think about machine translations
Released in v2.7.0. Thanks @MartinCerny-awin!
Feature Request I would like to have pseudolocalization automatically done when extracting messages. Pseudolocalization simplifies testing of the localization. When I am doing localization, I might not speak languages I have to localize our app into. The translation is done by professional translator. At the same time I need to verify that all strings are extracted. In pseudolocaliztion strings are replaced by special characters, which are still readable and easily testable.
Example:
Account Settings: [!!! Àççôûñţ Šéţţîñĝš !!!]
Describe the solution you'd like CLI with appropriate parameter would create a pseudolocalized translations from default messages. These would be stored under special locale.
Describe alternatives you've considered Pseudolocalization can be run after the extraction.
Additional context https://en.wikipedia.org/wiki/Pseudolocalization
I might be able to implement this functionality, if there is any probability of merging it.