w3c / i18n-discuss

A place to hold discussions on i18n topics, and to put documents that summarise, support or initiate those discussions.
16 stars 10 forks source link

[notes/string-base-direction] Using RLI...PDI wrapper #3

Open r12a opened 7 years ago

r12a commented 7 years ago

source

Mati Allouche said:

7) In "Paired formatting characters", we read "The Unicode bidi algorithm is unable to ascertain the base direction for a string that starts with RLI/LRI/FSI and ends with PDI,"

This is unclear to me. If the string starts with RLI/LRI/FSI and ends with PDI, there is nothing else beside it, so there is no base direction for the outer part and the direction for the inner part is defined by LRI/RLI/FSI.

If the string consists of the isolated sequence and some more text after it, the UBA first-strong heuristics must be applied to the part of text after the PDI. This is coherent with the requirement that the consumer be aware of the protocol used by the producer.

This may be what is meant in the next paragraphs of the same section, but the paragraph mentioned above comes first and is puzzling (for me at least).

r12a commented 7 years ago

Thinking out loud here...

For the situation where there is nothing outside the RLI/LRI/FSI ... PDI then, because the UBA treats the formatting codes and internal text as a neutral character, there is no detectable strong character.

The text inside the RLI/LRI/FSI ... PDI stands a good chance of behaving correctly wrt base direction when it is consumed (as long as the consumer knows how to handle RLI etc, which is not necessarily the case at the moment), but the base direction of the string is also used for decisions about text alignment for the string as a whole.

If the string was inserted into a web page as a paragraph, even if the stuff inside was correctly ordered, the line would by default be left aligned in a LTR page and right aligned in a RTL page. That may actually be ok much of the time.

Problems would only arise if there was an expectation to align the paragraph according to the content. In that case, and if you used the normal UBA algorithm for first-strong detection, there'd be nothing to indicate the expected alignment. Question is, how often would that occur?

r12a commented 7 years ago

I think there's also a potential issue if the string looks like this "RLI some text PDI RLI more text PDI". In this case i think the ordering of the two RLI...PDI sequences would depend on the overall base direction for the string. If the text was scraped from a LTR document and added to a RTL document, it may be important to retain the original order (although not always).