unixwork / xnedit

A fast and classic X11 text editor, based on NEdit, with full unicode support and antialiased text rendering.
Other
83 stars 11 forks source link

Right to left language text is rendered incorrectly. #131

Open dpeterc opened 9 months ago

dpeterc commented 9 months ago

Xnedit nicely supports UTF-8 characters, but does not support correct right to left rendering. Some other text editors like "kate" decide based on first character, and then they render the line left to right or right to left. See an example of two line text in Persian: من هر روز از Xnedit استفاده می کنم. Xnedit بهترین است!

First line is rendered correctly on the web and in kate, second line has Persian text mirrored. Xnedit has all text mirrored. The text in google translate on the right is rendered correctly in the browser. PersianNedit First step would be to add right to left rendering on per/line basis, same as in "kate" editor.

Final solution would be to support proper bidi rendering, by using right to left and left to right unprintable UTF-8 markers. https://en.wikipedia.org/wiki/Right-to-left_mark https://en.wikipedia.org/wiki/Left-to-right_mark For text without the markers, some heuristics must still be used, and it will inevitably fail in some cases.

unixwork commented 8 months ago

Yes this is completely unimplemented. The NEdit text widget was never designed for that. It will be challenging to implement it but I will look into it.

dpeterc commented 8 months ago

I understand that full BIDI is almost impossible, but maybe doing left to right and right to left rendering on per-line basis should be doable. After all, Motif supports it by global setting of XmNlayoutDirection, or to individual widgets or widget classes. See xnedit with right to left rendering of user interface: xnedit -xrm "*.layoutDirection: RIGHT_TO_LEFT"

neditRightToLeft

Last solution from this page https://stackoverflow.com/questions/12006095/javascript-how-to-check-if-character-is-rtl offers a relatively simple check for string direction.

 isRTL(text) {
  let rtl_count = (text.match(/[\u0591-\u07FF\uFB1D-\uFDFD\uFE70-\uFEFC]/g) || []).length;
  let ltr_count = (text.match(/[A-Za-z\u00C0-\u00C0\u00D8-\u00F6\u00F8-\u02B8\u0300-\u0590\u0800-\u1FFF\u2C00-\uFB1C\uFDFE-\uFE6F\uFEFD-\uFFFF]/g) || []).length;

  return (rtl_count > ltr_count);
}

But I also understand that it is far from the intended use, and that you might wish to concentrate on features useful for programmers, not for translators.