w3c / alreq

Documenting gaps and requirements for support of Arabic and Persian on the Web and in eBooks.
Other
62 stars 31 forks source link

What tools can be used for writing HTML source code with bidi text? #194

Closed r12a closed 4 years ago

r12a commented 5 years ago

Writing HTML source code with bidi text is difficult, since the angle brackets, ampersands, quote marks and element/attribute names, etc. get mixed up in a RTL environment. (See https://www.w3.org/TR/2014/NOTE-i18n-html-tech-bidi-20140603/#bidisource)

Some solutions that may make sense for just short amounts of bidi text.

  1. Many people writing bidi source code put the RTL content on a separate line from the start tag, eg.
    <p>
    text goes here
    </p>

One thing to watch here, however, is that you shouldn't have a closing tag on the next line if the openining tag has dir=rtl, since there can be problems with spaces (see https://www.w3.org/International/questions/qa-bidi-space)

  1. At least one person i know uses an editor that doesn't know anything about the Unicode bidi algorithm or RTL support. He writes short lengths of text in the normal Unicode order, but of course has to read them backwards (ie. LTR) in the source code itself.

  2. You could also use character escapes, such as in

<p dir="rtl">&#x0646;&#x0634;&#x0627;&#x0637; &#x0627;&#x0644;&#x062A;&#x062F;&#x0648;&#x064A;&#x0644;&#x060C; W3C</p>

(https://r12a.github.io/app-conversion/ may help there)

My main question is: What editors support HTML source code editing, so that the markup doesn't get scrambled?

ntounsi commented 5 years ago
1. At least one person i know uses an editor that doesn't know anything about the Unicode bidi algorithm or RTL support.  He writes short lengths of text in the normal Unicode order, but of course has to read them backwards (ie. LTR) in the source code itself.

I too sometimes use such kind of editor, for example "Sublime Text". To edit a small part of a source to make very fine and localized corrections, which are difficult in Wysiwyg or HTML source with BIDI.

Example:

  1. HTML source without Bidi (real memory source) source2
  2. Same source with Bidi applied source1
  3. Same as above without &nbsp; source1-bis

Rendering is page

It is not easy to add, say, &nbsp; in the third case above. More easy to do it in source-1 than in source-2.

That said, my normal HTML éditor is a Wysiwyg one. BlueGriffon in this case. Even to touch the source when it is easy (most of the time).

Of course, put bidi text content in a separate line than markup, is the tip for easy edit sometimes.

behnam commented 5 years ago

Two general matters:

Basically, IMHO, without specifically managing Bidi Context in every single component (like an HTML tag and text nodes), there's no way to get anything working properly.

Based on that, I'm not fan of any recommendations like "add an LRM here" or "drop an ALM there" kinds of recommendations.

khaledhosny commented 5 years ago

Totally agree with @behnam here. As a data point, I know one XML editor that seems to get this right by separating the tags from the content, though I have only tried it once: https://www.oxygenxml.com/xml_editor/unicode_and_internationalization.html. I think this approach can even be used without turning XML tags into widgets.

r12a commented 5 years ago

I was under the impression that we already decided to not get involved in the Developer Tools kind of areas in ALReq

I wasn't thinking of writing this up in alreq at all. Just asking a question. This group being an obvious place to get some answers.

ntounsi commented 5 years ago

I wasn't thinking of writing this up in alreq at all. Just asking a question.

That was my understanding of the question.

BTW, presenting source without Bidi has a "hidden" questions: which form of an Arabic character to use when presented in memory order? Isolated or joined shape?

twocaseslogical

In this case, the first line above might seem more "acceptable" than the second one. The problem with the latter case is that it may make use of the presentation-form of the letters. (Could it be otherwise?)

r12a commented 4 years ago

Closing this since the discussion is no longer active. Feel free to add further comments, or reopen as needed.