Open zbraniecki opened 6 years ago
Maybe just introducing some generic "post-processing" on messages in MessageContext, and then adding fluent-pseudolocale
would work as the first step?
let ctx = new MessageContext(['ar'], {
process: fluent_pseudolocales.transform.bind('ar-XB')
});
let msg = ctx.formatValue('l10n-id');
Maybe just introducing some generic "post-processing" on messages in MessageContext, and then adding fluent-pseudolocale would work as the first step?
That was my first thought as well. A few additional thoughts below. I'll try to have answers tomorrow.
We have to take into account how this will interact with the language negotiation. Would we expect the user to set their requested locale to a pseudolocale in order to enable it? Would we require that developers add pseudolocales to the list of available
locales in their app?
Perhaps it would make sense to encode pseudolocales as Unicode extensions to BCP47? Something like ab-CD-u-pseudo-accent
or ab-CD-u-pseudo-rtl
. The language negotiation process would then still correctly pick the regular ab-CD
for fetching translation resources. Some logic would then be responsible for transforming the fetched resource using the fluent-pseudo
module.
What should be the outcome of formatting a date or a number in a pseudolocalized translation?
Also, the first step might be to only support build-time pseudolocalization.
Google just went for en-XA
, and ar-XB
and added them to CLDR. So we can get internationalized date/time from CLDR 31 if we use those two.
Now, my problem with this is that because they used en-XA
and not fr-XA
, the numbers still look the same. I recommended them fr-XA
, but it may be too late.
Since we'd be doing runtime pseudo, maybe we don't need extensions (and they wouldn't be unicode extensions, but rather variants ( Google originally used en-psaccent
and ar-psaccentrtl
or sth like that).
Maybe all we need is:
let ctx = new MessageContext('pl', {
process: pseudo
});
and it'll transform polish strings? This way we could get a pseudo of the current locale, irrelevant of what it is.
How would you decide when to turn pseudolocalization on? A different logic independent of the language negotiation?
How would you decide when to turn pseudolocalization on? A different logic independent of the language negotiation?
Yeah! This way the user can either detect pseudo from a langtag (oh, you're using XA region?
) or by some checkbox (show me pseudolocale
).
Since we're on client-side at runtime, that would mean no rebuilding, restarting or anything. Just take the exact locale we use, whatever it is, and recompute for pseudo.
Yeah! This way the user can either detect pseudo from a langtag (oh, you're using XA region?) or by some checkbox.
If the user sets their requested to en-XA and the available only have
en-US, the result of the language negotiation will be en-US
. At the moment
when we’d create the MessageContext we wouldn’t know the region was XA.
On the other hand, if we add en-XA to the list of available locales, and the files for it do not exists on disk, we will fail to fetch anything. We’d need to extend the IO logic to fetch them. This might mean moving the pseudolocalization to fluent-web. I think it would be better to have it on a lower level though.
Does the language negotiation preserve extensions found on the requested locales?
Oh, you're right.
I think it would be better to have it on a lower level though.
I agree.
Does the language negotiation preserve extensions found on the requested locales?
yes.
So, maybe private-extension? fr-FR-x-pseudo
?
We just need to make sure that if we see a pseudo like this, we actually feed en-XA
, ar-XB
to Intl API (so that CLDR picks it up)
There's discussion in http://unicode.org/cldr/trac/ticket/3971 and http://unicode.org/cldr/trac/ticket/9819 on why CLDR didn't go for variant tags. It's mostly about compatibility with existing code. Also, since en-XA
and ar-XB
are now in CLDR we should stick to these codes. I wish we hadn't missed the discussion when it happened.
I'm reconsidering my stance on where this logic should live. Having it higher up, e.g. in fluent-web would allow multiple approaches:
ctx.format()
.fluent-web can also transform translations in a way which preserves HTML for the overlay mechanic.
I'd think that the most accute way to implement the actual pseudo localization would be on the AST?
On buildtime or on runtime?
For both, I guess.
Transforming the runtime AST means doing the transformation inside of MessageContext
. That still might a viable option given my earlier comments: fluent-web could supply a markup-aware transform function to the MessageContext
constructor. Compared to transforming the result of ctx.format()
, this would have the advantage of only transforming TextElements
in the translation rather than the whole string.
We still need to solve the problem of fetching valid locale files. Given that it looks like fluent-web (or fluent-react) would need to handle the pseudolocalization anyways (if only to be HTML-aware), I think it makes sense to special-case en-XA
and ar-XB
in their IO.
For example, given the following result of language negotiation:
en-XA
, de
en-XA
, en-US
, de
en-US
en-XA
, de
, en-US
…a developer using fluent-react
will need to add a special case to the IO code which fetches en-US
when en-XA
is requested. This sounds okay to me since the same developer has already put en-XA
among the available
locales.
let ctx = new MessageContext(negotiated, { pseudo: makeAccent });
ctx.addMessages( /* en-US translations to be transformed into en-XA */);
Or, if the build pipeline is capable of building pseudolocales up front, the IO code would simply fetch the pre-made en-XA
files.
let ctx = new MessageContext(negotiated);
ctx.addMessages( /* en-XA translations generated on build-time */);
I do not agree with Stas that we have to use en-XA
and ar-XB
here. I believe it's perfectly fine for us to use whatever mechanism we want to use to recognize pseudolocales, and then just make sure to collapse in Intl constructor onto en-XA
and ar-XB
for Intl API / CLDR.
Note that the approach from my previous comment will work with any scheme of specifying pseudolocales. In my example I chose to put en-XA
in requested
but it could also be an app-specific pref which handles that. This is also how I understood your comment from 5 days ago.
There's value in using en-XA
, ar-XB
now that they were standardized in the CLDR. They will become recognizable names for pseudolocales and with time will gain support in various tools and platforms.
@stasm - would you have time to draft a plan to get this into a POC state? I'm happy to commit to work on that, but would prefer to follow your vision.
Some POC prototyping gave me this: https://youtu.be/E3t8-u8e5D0
It's actually quite simple to get to that point, and even get Intl hooked in. There's going to be more work to be done to get complex messages handling.
I'm wondering if it's better to import pseudo for side-effects and allow itself to hook into fluent:
import "fluent-pseudo";
let cx = new MessageContext(locales, {
usePseudo: true
});
or make people hook it explicitly:
// strategy1 - 30% longer via duplication of vovels, larin chars transformed, LTR
import { strategy1 } from "fluent-pseudo";
let cx = new MessageContext(locales, {
transform: strategy1
});
Enough for now, will wait for stas :)
I find the explicit version easier to understand. It will also be easier to test.
From a developer point-of-view, I don't expect that any Firefox developer will be touching code at that abstraction level. We explicitly don't want these folks to know that MessageContext even exists.
Agreed. IIUC this issue is about the low-level API which fluent-web
will completely hide.
@zbraniecki and I talked about this yesterday and today. We'd like to start simple with the approach from comment https://github.com/projectfluent/fluent.js/issues/83#issuecomment-337954325.
MessageContext
constructor will accept a process
or transform
option whose value is a function to be invoked on all TextElements
.
MessageContext.addMessages
call in the runtime parser.psaccent
and psbidi
transforms. We'll discuss the exact strategies and implementations later.en-US
, de
, etc. The transform function should only be passed to the constructor if the user has expressed interest in using pseudolocales. This should be handled outside of Fluent.
Tuesday
if the current locale is en-US
).Intl.Locale
(and fluent-locale
) and it will be easy to recognize well-formed BCP47 variant tags, e.g. en-US-psaccent
and en-US-psbidi
.
en-XA
and ar-XB
are called such mostly because of legacy code in Android which wouldn't handle language tags with variants.MessageContext
could by default include transforms for known pseudolocales.Users are free to write their own transform functions. We encourage experimentation.
This issue uses the word user
for a ton of things, I'm loosing track.
Say, I'm a firefox developer, and I want to run my local build with psaccent on. How would I do that, and which parts of our code stack are involved in doing so, and what would they need to do?
This issue uses the word user for a ton of things, I'm loosing track.
Good point. I meant the users of the library here. Elsewhere I meant the user of the app.
Say, I'm a firefox developer, and I want to run my local build with psaccent on. How would I do that, and which parts of our code stack are involved in doing so, and what would they need to do?
You would start by flipping a pref somewhere in the UI. The values of the pref could be: psaccent
, psbidi
. fluent-gecko
(which is fluent-dom
packaged for Gecko privileged content) would observe this pref and use in its generateMessages
which constructs MessageContext
instances. fluent-react
in Devtools would need to do the same.
Can we close this issue? We have capability for pseudolocalization since fluent 0.7 and we use it in Gecko. Or should we wait until we extract fluent-pseudo
as a package (I have that in rust - https://github.com/projectfluent/fluent-rs/tree/master/fluent-pseudo )
Hey @zbraniecki, by chance would you have some guidance or documentation about how to use pseudolanguages with fluent.js/fluent-react in a plain web page (as opposed to in Firefox)? Thanks!
hmm, I can tell you how to enable it in fluent.js, not react You need to extract from an old L10nRegistry.jsm https://hg.mozilla.org/mozilla-central/file/a1f74e8c8fb72390d22054d6b00c28b1a32f6c43/intl/l10n/L10nRegistry.jsm#l425 and then when constructing FluentBundle you pass a method as transform - https://github.com/projectfluent/fluent.js/blob/master/fluent-bundle/src/bundle.ts#L61 I assume something similar happens for react, but I'm short on details if you do spend time, I'd accept that resurrected block of code as fluent-pseudo in fluent.js repo to maintain it!
Thanks for the pointers! This is what I did to support pseudo locales in the profiler: https://github.com/firefox-devtools/profiler/pull/3188
We enable a pseudo locale by calling a function in the devtools console.
Would the file https://github.com/firefox-devtools/profiler/pull/3188/files#diff-ca1e6802f7be91e16b4123f89f090a2c40053a53e52b73ed3d69469619179d24 be suitable as fluent-pseudo
? I'm not sure how "bidi" would set "rtl" with "fluent-dom", do you know? Or maybe fluent-dom doesn't set it anyway, like fluent-react?
yeah, it looks good!
For a while we used a hardcoded list which is quite stable - https://github.com/mozilla-b2g/gaia/blob/master/shared/js/intl/l20n-client.js#L31-L35
Coming back from the Unicode Conference, there was a lot of chatter about pseudo-locales.
Fluent already had a pretty good support for pseudo-locales in the past and due to our client-side mode, we offer an exciting approach to pseudo-locales - runtime pseudolocalization.
I'd like to bring back this: https://github.com/l20n/l20n.js/blob/v3.x/src/lib/pseudo.js to modern fluent.
@stasm - do you have any thoughts on how would you like it to work?