wassemgtk / pseudolocalization-tool

Fork: Automatically exported from code.google.com/p/pseudolocalization-tool
Apache License 2.0
0 stars 0 forks source link

fake bidi method can be improved by adding RLMs #10

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
The fake bidi method can produce output that even more closely resembles real 
RTL text by adding an RLM before each RLO and after each PDF. For example, 
where currently for "hello world" it produces "\u202Ehello\u202C 
\u202Eworld\u202C", it would now produce "\u200F\u202Ehello\u202C\u200F 
\u200F\u202Eworld\u202C\u200F".

While most of the time the visual output would be identical, adding the RLMs 
has two advantages:

1. The first-strong directionality estimation method, as specified in the 
Unicode Bidirectional Algorithm's rules P2 and P3 
(http://www.unicode.org/reports/tr9/#P2), would then decide that fake bidi text 
is RTL; currently it decides that it is LTR. As a result, fake bidi text 
currently does not behave in the same way as real RTL text (e.g. Hebrew or 
Arabic) in contexts like Android TextViews and HTML's dir="auto" attribute, 
which use the first-strong algorithm. Adding the RLM would fix this discrepancy.

2. When a message contains a placeholder followed by a localizable text 
fragment that begins with a strong character (not a neutral character like a 
space or punctuation), and the placeholder ends in a number, the visual 
ordering that currently results for fake bidi localization is not equivalent to 
that resulting for a real RTL translation: in an RTL context, with fake bidi, 
the number appears to the left of the text fragment; with real RTL text, the 
number appears to the right. For example, let's say that the placeholder value 
is "12" and the localizable text fragment is "hello". Then, when fake bidi 
changes the "hello" into "\u202Ehello\u202C", the overall output is 
"12\u202Ehello\u202C". You can see the visual ordering specified for that by 
the Unicode Bidi Algorithm in an RTL paragraph here: 
http://unicode.org/cldr/utility/bidi.jsp?a=12%E2%80%AEhello%E2%80%AC&p=RTL; the 
number is on the left. However, if the text fragment were the Hebrew character 
alef, "\u05D0", and thus the whole string were "12\u05D0", the number would 
come out on the right: 
http://unicode.org/cldr/utility/bidi.jsp?a=12%D7%90&p=RTL. This is fixed by 
adding the RLMs to fake bidi: "12\u200F\u202Ehello\u202C\u200F" is displayed 
with the number on the right, as with real RTL text 
(http://unicode.org/cldr/utility/bidi.jsp?a=12%E2%80%8F%E2%80%AEhello%E2%80%AC%E
2%80%8F&p=RTL). The same issue occurs when a placeholder follows a localizable 
text fragment that ends in a strong character; this is why I am suggesting not 
only to put an RLM before the RLO, but also to put an RLM after the PDF. One 
may think that it is strange to have a placeholder come immediately before or 
after strong text, not a neutral like a space or punctuation; text like "hello: 
12" or "12: hello" is a lot more common than "hello12" or "12hello". However, 
the same issue occurs (and is fixed by the RLMs) when between the placeholder 
and the localizable text fragment is a nonlocalizable text fragment containing 
markup that introduces a space between the two, e.g. "<span style='padding: 
5px'>", and this is unfortunately a fairly common practice in HTML.

Original issue reported on code.google.com by aha...@google.com on 7 Aug 2014 at 8:53

GoogleCodeExporter commented 8 years ago
> in contexts like Android TextViews and HTML's dir="auto" attribute

Add Java Swing to the list; see issue 9 for more info.

Original comment by aha...@google.com on 10 Aug 2014 at 6:36