w3c / html

Deliverables of the HTML Working Group until October 2018
https://w3c.github.io/html/
Other
1.97k stars 544 forks source link

make any element with an explicit lang attribute (and no dir attribute) bidi-isolated by default #226

Closed travisleithead closed 5 years ago

travisleithead commented 8 years ago

Moved from Bugzilla: https://www.w3.org/Bugs/Public/show_bug.cgi?id=18490

This idea came to me while editing Wikipedia, which is a massively multilingual site.

An element that has an explicitly defined lang attribute can also have different directionality from the enclosing element. Even more so for any element that has an explicit dir attribute. It is also likely that an element with a different directionality should be bidi-isolated using <bdi> or "unicode-bidi: isolate".

Therefore, bidi isolation should be the default for such elements. Of course, it should be possible to override it if that's what the user wants.


IMO, this is the correct approach for new pages. (In fact, I would even say unicode-bidi:isolate for elements with the dir attribute, and unicode-bidi:plaintext for elements with the lang attribute but lacking dir.) And in fact it is easy to do that in your (new) page's own CSS. However, one has to keep in mind that isolation is not implemented in IE9, and may or may not be implemented in the final release of IE10. So, if you need to have your page work bidi-wise for a high percentage of users, this just isn't good enough, not for a long while yet. :-(

The other major concern here is backward compatibility. That is why I said above "new" pages. Applying bidi isolation to all elements with the dir attribute will break some existing pages.

On the other hand, my guess is that applying bidi isolation to something with the lang attribute (and lacking a dir attribute) is unlikely to break existing pages, so it is probably a good idea.

@amire80

tomerm commented 8 years ago

I personally like this idea very much. However, please take into account following implementational hiccups:

1. Browser side- To properly enforce direction of text based on language one must find out natural text direction for given language (i.e. LTR for English, RTL for Farsi). As you may know languages can be written using different script and natural direction is actually an attribute of script. Some ambiguity exists since the same language (i.e. Uzbek) can be expressed using different scripts with different natural directions (in case of Uzbek: Arabic - RTL, English/Cyrillic - LTR). But in any case we need to trace: [Language -> Script -> Natural text direction] chain to properly enforce text direction. This chain is well defined in CLDR (http://cldr.unicode.org/) and can be even traced programmatically. For example using ICU (http://site.icu-project.org/): see tickets: http://bugs.icu-project.org/trac/ticket/8633 and http://bugs.icu-project.org/trac/ticket/10736

2. Web app- Not always information about content language is available to web app. Thus in order to assign proper lang value we need to identify the language. While it is not a new task and for some languages it is even easier than for others (using just encoding information), in general it is a very hard problem (especially in the multilingual environment) and by all means a computationally not cheap operation.

chaals commented 6 years ago

maintain assignment, milestone set.

siusin commented 5 years ago

We're closing this issue on the W3C HTML specification because the W3C and WHATWG are now working together on HTML, and all issues are being discussed on the WHATWG repository.

If you filed this issue and you still think it is relevant, please open a new issue on the WHATWG repository and reference this issue (if there is useful information here). Before you open a new issue, please check for existing issues on the WHATWG repository to avoid duplication.

If you have questions about this, please open an issue on the W3C HTML WG repository or send an email to public-html@w3.org.