Open stasm opened 5 years ago
probably all intl formatters as well since they may end up using different locale in fallback scenarios and even in a single-locale scenario may require isolation. See https://github.com/tc39/ecma402/pull/290
Sounds reasonable, thanks.
Given our EBNF:
InlineExpression ::= StringLiteral
| NumberLiteral
| FunctionReference
| MessageReference
| TermReference
| VariableReference
| inline_placeable
it would appear that we need to wrap VariableReference
and FunctionReference
, while other expressions are probably safe to leave as they are?
yeah, that sounds right. @srl295 - do you have any backpointers or experience with interpolation and bidi? If we localize the NumberLiteral
using Intl.NumberFormat
to the same locale as the main strings is in, is there any reason to put FSI/PDI around it?
Note that FSI/PDI (and other isolating controls) have weak support in browsers currently (see https://w3c.github.io/i18n-tests/results/bidi-algorithm#rli_etc). I get very poor results with isolating controls in Chrome, for example. Here's Chrome (left, incorrect) and FF (right):
HTML:
<p dir=rtl>ل⁧-1234.56⁩م</p> <!-- RLI/PDI -->
<p dir=rtl>ل⁦-1234.56⁩م</p> <!-- LRI/PDI -->
<p dir=rtl>ل⁨-1234.56⁩م</p> <!-- FSI/PDI -->
<p dir=rtl>ل-1234.56م</p> <!-- no controls -->
@zbraniecki Note that number strings often include leading/trailing punctuation (neutrals) and that digits are often left-to-right. The point of using isolating controls is that it establishes a separate base direction linked to the locale of the inserted string and that the resulting inclusion doesn't impact the containing string's layout (it eliminates "spillover effects" that can occur with the non-isolating controls).
You probably don't want to use first-strong heuristics (that is, FSI) when you know the direction of the placeable (i.e. it was made by a formatter) but instead want to use the direction of the formatter's locale (so RLI or LRI--which don't work any better in several major browsers). When you don't know the direction of the placeable you can use FSI in the absence of direction metadata (but it is better to have direction metadata or infer it from the language of the data if that's available). See String-Meta for more details.
Assuming we can get implementations fixed, then yes all placeables should be wrapped in isolating controls.
@spookylukey wrote a couple of interesting comments around isolation and attributes in https://github.com/django-ftl/python-fluent/blob/implement_escapers/fluent.runtime/docs/escaping.rst.
I've looked at that branch in particular because I think there's some conceptual overlap between the needs of bidi isolation and html escaping. I keep thinking that we might want to extract both algorithms to a post-format step, if format would return an iterable that provided enough meta data for these algorithms to do their respective jobs.
Which also provides all my thoughts on #273.
Thanks @aphillips !
Assuming we can get implementations fixed, then yes all placeables should be wrapped in isolating controls.
Do you mean all placeables, or just the ones we listed?
Currently, we wrap all placeables in FSI/PDI, because we assume that directionality within the placeable may be different than the surrounding text.
For example, if my string is in ar
, and I use Ecma402 Intl.NumberFormat
to format the number, I still may end up with a different directionality (for example, if ar
data is absent) for the number than for the surrounding text.
My current thinking is that we can skip the isolation for StringLiteral, MessageReference and TermReference. Those 3 are realiably guaranteed to match the directionality of the pattern.
For variables, functions and numbers (which are functions behind the scenes) I'd prefer to keep the FSI/PDI.
@zbraniecki Actually, I do mean all placeables--and especially the ones that involve placing strings inside of other strings--so precisely StringLiteral, MessageReference, and maybe TermReference. We try to illustrate the problems in String-Meta here.
Those 3 are realiably guaranteed to match the directionality of the pattern.
Why do you believe this to be the case?
It would be even better, of course, to replace FSI with LRI or RLI if the placeable's base direction is known (which in most cases it should be). This helps with placeables that have opposite direction initial sequences (the HTML و CSS
example in String-Meta was chosen to help illustrate this).
Why do you believe this to be the case?
Because they should be in the same locale.
MessageReference in Fluent happens in such a case:
close-window = Close Window
close-window-command = Click { close-window } to close the window.
In such case, I'd expect there to be a soft guarantee that both messages are in the same script and share directionality. Similar situation happens with terms.
As for StringLiteral, an example would be:
padded-text = { " " } This phrase is padded with 6 spaces.
Some time ago, the decision has been made to use string literals for start-padding of strings (otherwise Fluent will cut out the pre-padding). Since the translation and the literal come in the same locale and are bound together, I see a pretty good chance that they share directionality as well.
Formatting them to "\u2068 \u2069 This phrase is padded with 6 spaces." feels odd.
As to RLI/LRI. The cases where I want to use isolation are exactly where we don't know what directionality the placeable will take:
hello-world = Hello, { $user }!
Since the $user
comes from the code (and maybe from the user itself), it may have any directionality. I want to wrap it in FSI/PDI to instrument layout to recalculate the directionality of this fragment. Result: "Hello, \u2068فارص\u2069!"
Does it make sense?
Should all placeable be wrapped in bidi isolates? Perhaps just
VariableReferences
?