wez / wezterm

A GPU-accelerated cross-platform terminal emulator and multiplexer written by @wez and implemented in Rust
https://wezfurlong.org/wezterm/
Other
17.91k stars 805 forks source link

Add support for RTL languages #784

Open CIAvash opened 3 years ago

CIAvash commented 3 years ago

Is your feature request related to a problem? Please describe. wezterm cannot display right to left languages correctly. RTL text is not not RTL, and characters that need to be combined, are not.

Describe the solution you'd like Support RTL text. Probably needs bidirectional text handling and text shaping.

Related projects: harfbuzz, FriBidi

Describe alternatives you've considered Konsole and gnome-terminal support RTL languages.

Additional context Image, although the image is comparing Konsole with Alacritty, wezterm works just like Alacritty.

wez commented 3 years ago

https://terminal-wg.pages.freedesktop.org/bidi/ has some excellent notes on how to model bidi in terminal emulators.

To make progress, I need to better understand:

CIAvash commented 3 years ago

Can pango be an option? Although it has some gtk dependencies, it uses harfbuzz and fribidi. That's probably how gnome-terminal supports RTL.

For testing, let me know if I can help by providing text content.

Also, if you are using harfbuzz, shouldn't Arabic script characters get combined? Currently they don't.

CIAvash commented 3 years ago

I forgot to say that (I think) @behdad (creator of harfbuzz) is responsive, if you have questions.

CIAvash commented 3 years ago

There is also servo's unicode-bidi. Mentioned in alacritty/alacritty#663.

wez commented 3 years ago

There's discussion on https://github.com/kas-gui/kas-text/issues/20 about bidi implementations for Rust.

My impression right now is that the state of bidi in Rust is young and that the easiest path will result in a relatively slow bidi implementation, which isn't ideal: shaping already costs perf in wezterm today. Putting in more work on the promising alternative mentioned in that thread will likely be a better end-state, but will take more effort and that shouldn't be owned by wezterm.

The main constraint I have right now is time: if someone has time and wants to drive this forward, I'm very receptive to seeing wezterm support bidi and helping that person figure out how to integrate it into wezterm.

wez commented 2 years ago

I've pushed a commit with what is probably the bare minimum level of support: I'm sure it's wrong in a number of cases, but with this as my test case (borrowed from https://github.com/microsoft/terminal/issues/538#issuecomment-677017322)

Starting wezterm like this to start with the default config, then make the font bigger and turn on bidi mode:

wezterm -n --config font_size=36 --config initial_rows=5 --config initial_cols=30 \
    --config experimental_bidi=true

that's equivalent to running with this config:

return {
   font_size = 36,
   initial_rows = 5,
   initial_cols = 30,
   experimental_bidi = true, -- this is the bit you want to use to try this out
}

Pasting: This is RTL -> عربي فارسی into the terminal:

image

TODO:

j4james commented 2 years ago

Note that the terminal-wg bidi document, while giving the impression of being well researched, makes no mention of the DEC RTL sequences from the VT5xx terminals (e.g. DECRLM) and related modes supported by Hebrew terminal emulators like Hterm. IMO those existing modes were much more useful for anyone doing serious RTL development than any of the modern proposals.

wez commented 2 years ago

Thanks James; I'll queue up some more reading/research!

wez commented 2 years ago

@behdad I don't mean to pounce, but I wonder if you have suggestions specifically on handling the narrower glyphs in فا in a monospace/terminal context; the x_advance in this case is approx. half the monospace cell width. wezterm uses harfbuzz under the covers, but has some logic to override x_advance to make cells line up. Is this particular case best solved simply by using a different font that has wider versions of these glyphs? Or are there some recommended flags/modes for harfbuzz that I should consider?

This is how that same sequence renders in Terminal.app: image

Even if I use the same font (which I think is the SF Arabic font), I still have gaps in my presentation. It feels like something in Terminal.app knows to stretch those ligatures and I wonder if harfbuzz has some way to express that? Or is this just deep magic in Apple's shaper/typography implementation?

(Maybe sort of related: #1333 is a feature request for Devanagari support, which also has some challenging glyph widths for a terminal. Would love to hear your thoughts on that as well!)

I'd also love to hear if you have other recommendations on bidi/rtl support in the context of a terminal?

CIAvash commented 2 years ago

Currently I can report 2 issues:

The other one you mentioned yourself, a space between glyphs that are combined together; I see this problem in VTE based terminal as well, if a non-monospace font is used. If I use a monospace(DejaVu Sans Mono) font it shows correctly(in wezterm and VTE based terminal).

behdad commented 2 years ago

Is this particular case best solved simply by using a different font that has wider versions of these glyphs?

Yes.

It feels like something in Terminal.app knows to stretch those ligatures and I wonder if harfbuzz has some way to express that? Or is this just deep magic in Apple's shaper/typography implementation?

HarfBuzz doesn't know that. I haven't checked Terminal.app. It might be a geometric stretch. You sure it's using the same font?

behdad commented 2 years ago

This is how that same sequence renders in Terminal.app: image

Looks obviously a different font.

wez commented 2 years ago

I didn't find exactly the font that Terminal.app is using, but I found that updating my local copy of Cascadia Code and using that looked better: I'll stop chasing that particular dragon :)

wez commented 2 years ago

Currently I can report 2 issues:

  • If you put a number or LTR letter after an RTL letter(with or without space), it becomes LTR. On VTE based terminal, numbers work fine, but if you put an LTR letter, it becomes LTR.

Could you run: wezterm ls-fonts --text "EXAMPLE" where example is the text sequence you're trying, so that I can see exactly what sequence you mean and also what wezterm thinks it is doing?

  • Moving cursor position doesn't follow the RTL letter positions, So you can't tell where your'e typing(or changing) a letter.

I haven't done anything about cursor positioning or input so far. I don't know how to type this script into the terminal; could you run through how you do that? I'm assuming that you have a particular keyboard/IME configured. Could you walk me through typing a short bit of text (a couple of letters/glyphs) that mixes LTR and RTL so that I can try this for myself and not produce nonsense?

The other one you mentioned yourself, a space between glyphs that are combined together; I see this problem in VTE based terminal as well, if a non-monospace font is used. If I use a monospace(DejaVu Sans Mono) font it shows correctly(in wezterm and VTE based terminal).

I think part of the docs to write up around this will be to suggest a good monospace font. Cascadia Code is another option that at least is monospace, but for which I am not equipped to comment on legibility/usability vs. other Arabic fonts!

CIAvash commented 2 years ago

Could you run: wezterm ls-fonts --text "EXAMPLE" where example is the text sequence you're trying, so that I can see exactly what sequence you mean and also what wezterm thinks it is doing?

Only letters(متن=text, فارسی=Persian=Farsi), which works fine:

متن فارسی

wezterm ls-fonts --text "متن فارسی"
RightToLeft
15 ی    \u{6cc}      x_adv=10 glyph=3113 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
13 س    \u{633}      x_adv=10 glyph=3182 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
11 ر    \u{631}      x_adv=10 glyph=1127 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 9 ا    \u{627}      x_adv=10 glyph=3145 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 7 ف    \u{641}      x_adv=10 glyph=3214 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 6      \u{20}       x_adv=10 glyph=1    wezterm.font("Roboto Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/RobotoMono-Regular.ttf, FontConfig
 4 ن    \u{646}      x_adv=10 glyph=3233 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 2 ت    \u{62a}      x_adv=10 glyph=3155 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 0 م    \u{645}      x_adv=10 glyph=3230 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig

Same text with spaces and a number(۲=2) between the words:

متن ۲ فارسی

wezterm ls-fonts --text "متن ۲ فارسی"
RightToLeft
 6      \u{20}       x_adv=10 glyph=1    wezterm.font("Roboto Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/RobotoMono-Regular.ttf, FontConfig
 4 ن    \u{646}      x_adv=10 glyph=3233 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 2 ت    \u{62a}      x_adv=10 glyph=3155 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 0 م    \u{645}      x_adv=10 glyph=3230 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
LeftToRight
 0 ۲    \u{6f2}      x_adv=10 glyph=1194 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
RightToLeft
 9 ی    \u{6cc}      x_adv=10 glyph=3113 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 7 س    \u{633}      x_adv=10 glyph=3182 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 5 ر    \u{631}      x_adv=10 glyph=1127 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 3 ا    \u{627}      x_adv=10 glyph=3145 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 1 ف    \u{641}      x_adv=10 glyph=3214 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 0      \u{20}       x_adv=10 glyph=1    wezterm.font("Roboto Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/RobotoMono-Regular.ttf, FontConfig

Same text with spaces and a number(2) between the words:

متن 2 فارسی

wezterm ls-fonts --text "متن 2 فارسی"
RightToLeft
 6      \u{20}       x_adv=10 glyph=1    wezterm.font("Roboto Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/RobotoMono-Regular.ttf, FontConfig
 4 ن    \u{646}      x_adv=10 glyph=3233 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 2 ت    \u{62a}      x_adv=10 glyph=3155 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 0 م    \u{645}      x_adv=10 glyph=3230 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
LeftToRight
 0 2    \u{32}       x_adv=10 glyph=56   wezterm.font("Roboto Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/RobotoMono-Regular.ttf, FontConfig
RightToLeft
 9 ی    \u{6cc}      x_adv=10 glyph=3113 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 7 س    \u{633}      x_adv=10 glyph=3182 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 5 ر    \u{631}      x_adv=10 glyph=1127 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 3 ا    \u{627}      x_adv=10 glyph=3145 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 1 ف    \u{641}      x_adv=10 glyph=3214 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 0      \u{20}       x_adv=10 glyph=1    wezterm.font("Roboto Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/RobotoMono-Regular.ttf, FontConfig

With double quotes the spaces are also misplaced.

Without double quotes: wezterm ls-fonts --text (echo -n متن 2 فارسی)

متن 2 فارسی

I haven't done anything about cursor positioning or input so far. I don't know how to type this script into the terminal; could you run through how you do that? I'm assuming that you have a particular keyboard/IME configured. Could you walk me through typing a short bit of text (a couple of letters/glyphs) that mixes LTR and RTL so that I can try this for myself and not produce nonsense?

I set keyboard layouts in Sway window manager like this:

input * {
    xkb_layout "us,ir"
    xkb_options "grp:shifts_toggle,compose:caps"
}

And toggle between English and Persian.

In X, I think it's with this command: setxkbmap -layout us,ir -option grp:shifts_toggle or xorg config:

    Option "XkbLayout" "us,ir"
    Option "XkbOptions" "grp:shifts_toggle"

You can use online virtual keyboards: https://www.branah.com/farsi - With this you can switch between Persian and English https://www.lexilogos.com/keyboard/persian.htm - This one has the pronunciation of letters

So for typing "متن ۱ فارسی" in Persian keyboard layout: You would hit these keys: l j k SPACE 1 SPACE t h v s d For "متن RTL و متن LTR": l j k SPACE R T L SPACE , SPACEl j k SPACE L T R Last text on its own(Beginning with RTL letters):

متن RTL و متن LTR

Some random text samples: From Persian alphabet:

الفبای فارسی یا الفبای فارسی-عربی شاملِ ۳۲ حرف است که از الفبای عربی اقتباس‌شده است.

From English language:

اِنگلیسی (به انگلیسی: English، ‎/ˈɪŋɡlɪʃ/‎) یک زبان طبیعی از خانواده زبانی زبان‌های هندواروپایی از شاخه زبان‌های ژرمنی غربی است که اولین بار در انگلستان در عهد آنگلوساکسون‌ها مورد تکلم قرار گرفت و انگلیسی باستان شکل گرفت.

From Persian language

There are several letters generally only used in Arabic loanwords. These letters are pronounced the same as similar Persian letters. For example, there are four functionally identical letters for /z/ (ز ذ ض ظ), three letters for /s/ (س ص ث), two letters for /t/ (ط ت), two letters for /h/ (ح ه). On the other hand, there are four letters that don't exist in Arabic پ چ ژ گ.

I think part of the docs to write up around this will be to suggest a good monospace font. Cascadia Code is another option that at least is monospace, but for which I am not equipped to comment on legibility/usability vs. other Arabic fonts!

I took a look at Cascadia Code, it seems it's the font Microsoft uses for Windows terminal. In my opinion it doesn't look good, letters get stretched and are sometimes hard to read. There may be better fonts, but I haven't searched for one.

CIAvash commented 2 years ago

There is Vazir Code fonts, the Vazir Code Hack seems to look better.

wez commented 2 years ago

Thanks for this: it gives me something to play with and reason about!

CIAvash commented 2 years ago

Thank you for working on this.

wez commented 2 years ago

@behdad At the moment, I use the UBA to produce runs of the various embedding levels (to determine the direction) and feed each of those to harfbuzz without any bidi reordering. https://harfbuzz.github.io/what-harfbuzz-doesnt-do.html doesn't explicitly say which parts of the bidi algorithm should be applied pre/post shaping. Do you have recommendations about this?

I'm trying to figure out what I'm doing wrong for this example; the first grouping results in the space being reordered to the left and the last grouping has it reordered to the right. When wezterm renders these, it will render them starting from x=0 in the order they are listed below, incrementing x by the x_advance. The result is that there is no space between these runs, only around the edges.

; wezterm ls-fonts --text "متن ۲ فارسی"
RightToLeft
 6      \u{20}       x_adv=10 glyph=1    wezterm.font("Roboto Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/RobotoMono-Regular.ttf, FontConfig
 4 ن    \u{646}      x_adv=10 glyph=3233 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 2 ت    \u{62a}      x_adv=10 glyph=3155 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 0 م    \u{645}      x_adv=10 glyph=3230 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
LeftToRight
 0 ۲    \u{6f2}      x_adv=10 glyph=1194 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
RightToLeft
 9 ی    \u{6cc}      x_adv=10 glyph=3113 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 7 س    \u{633}      x_adv=10 glyph=3182 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 5 ر    \u{631}      x_adv=10 glyph=1127 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 3 ا    \u{627}      x_adv=10 glyph=3145 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 1 ف    \u{641}      x_adv=10 glyph=3214 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 0      \u{20}       x_adv=10 glyph=1    wezterm.font("Roboto Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/RobotoMono-Regular.ttf, FontConfig
behdad commented 2 years ago

I'm far removed from bidi algorithm right now to know what the expected output is.

khaledhosny commented 2 years ago

I didn't find exactly the font that Terminal.app is using,

The Arabic text in that screenshot is set in Courier New.

khaledhosny commented 2 years ago

without any bidi reordering

You need to reorder the runs, but without reversion the characters in RTL runs.

wez commented 2 years ago

Thanks @khaledhosny! OK, the current state of main appears to be at a similar level of support as Terminal.app to my untrained eye for the case of typing متن ۱ فارسی using the directions provided above.

Beyond the selection/copy/paste stuff, I think the next level of support would be to consider actual bidi-aware programs and see where wezterm has gaps; eg: supporting DECRLM and BIDI related escape sequences. I don't know what software has support for those things, or how it might be configured so I would love to hear about that.

I'm also unsure about eg: cating a text file with RTL content; would you expect that to right-justify RTL lines by default?

CIAvash commented 2 years ago

I'm also unsure about eg: cating a text file with RTL content; would you expect that to right-justify RTL lines by default?

In VTE-based terminals the text is left-jusified. If you pipe text to fribidi command line tool, it right-justifies the text, but with a fixed width.

I personally think it should be right-justified. Because that feels right, and that's how it is in GUI softwares. The width of the text is the same as the width of widget/window. ‌‌But each program seems to do it differently(per line, per paragraph or whole text).

A closer software to terminal is Emacs, it aligns the text to the right, I think only in text modes. And it does so differently, it doesn't do it per line, but per paragraph. So if paragraph starts with LTR letters, the whole paragraph is LTR and vice versa.

And I just tried to see how Emacs behaves in terminal, the text alignment behavior is the same, but it seems it tries to make it RTL again, resulting in a change of direction! So the RTL text becomes LTR(the letters themselves go from left to right!).

j4james commented 2 years ago

And I just tried to see how Emacs behaves in terminal, the text alignment behavior is the same, but it seems it tries to make it RTL again, resulting in a change of direction! So the RTL text becomes LTR(the letters themselves go from left to right!).

This is because it's designed to work in a standard terminal which doesn't reorder the display of RTL characters. Once the terminal decides to take responsibility for the character ordering, it makes it impossible for RTL/bidi-aware software to work.

You need to pick a side: you can't support actual RTL software and non-RTL software at the same time. Best you can probably do is provide an option that lets the user choose.

Some terminals that do RTL reordering will also have an escape sequence to disable that functionality, which apps like Emacs could potentially use. But I don't think there's a standard for that.

CIAvash commented 2 years ago

Emacs has some variables and functions which let you customize and change things. If I set bidi-display-reordering to nil then the text is displayed correctly, but it will be left-justified. I can also call set-justification-right to make it right-justified, but it seems to follow a limited width.

Emacs also has a variable called bidi-directional-controls-chars with value "\x202a-\x202e\x2066-\x2069".

wez commented 2 years ago

I just played a little bit with mlterm and it magically swaps my shell prompt to RTL when typing in the farsi text from above. It's pretty cool but definitely seems like it would be fraught with problems for compatibility.

https://terminal-wg.pages.freedesktop.org/bidi/recommendation/escape-sequences.html mentions a couple of escape sequences that are present in ECMA 48; BDSM (!), SCP and SPD that influence this behavior.

The main thing that has noticeable effect appears to be the SCP sequence: CSI 2 SPACE k to set to RTL or CSI 1 SPACE k to set to LTR. Emitting CSI 2 SPACE k causes the shell and all subsequent output to mirror similar to how mlterm looks when it sees RTL text in the line, but does it regardless of whether there is RTL text.

As James noted, it doesn't mention DECRLM. In that RTL mirrored mode, it doesn't appear necessary to manipulate the cursor movement, as it is effectively automatically flipped.

This is a screenshot of VTE: image

wez commented 2 years ago

mlterm doesn't support those SCP sequences (CSI 2 SPACE k)

j4james commented 2 years ago

Some of the other RTL escape sequences I'm aware of include:

I don't know the details of how they worked though.

wez commented 2 years ago

I found this gist with a summary of bidi support in various apps: https://gist.github.com/XVilka/a0e49e1c65370ba11c17

wez commented 2 years ago

Mintty has a nice succinct summary of its bidi related controls here: https://github.com/mintty/mintty/wiki/CtrlSeqs#bidirectional-rendering

ninjalj commented 2 years ago

In the case of konsole it takes a quite lazy approach. The BidiRenderingEnabled profile setting (normally changed through an UI configuration dialog) should be named something like ComplexTextLayoutEnabled, and does two things:

This works reasonably well for displaying text (RTL and Indic), but fails horribly for cursor movement over RTL text, non-monospaced text, ...

wez commented 2 years ago

Current state of main:

Config options:

Escape sequences:

These are primarily for bidi-aware applications to cooperate with the terminal. These are defined by ECMA-48 and adopted by VTE and mintty.

Stuff that still needs work:

My recommendation if anyone wanted to try this stuff in the nightly would be to run with bidi_enabled = true and just leave bidi_direction at its default LeftToRight value.

CIAvash commented 2 years ago

right-justified rendering seems wonky to me. I think something in there needs to be iterated in a different order, but I haven't nailed down quite what that is.

I tried AutoRightToLeft, but it made everything(the prompt as well) RTL and right-justified, sometimes not everything, even though there was no RTL text.

Also AutoLeftToRight had some misplaced spaces.

But yeah LeftToRight is working properly.

mostafaqanbaryan commented 2 years ago

Just wanted to say that, the work you're doing here is really awesome. Now support for RTL in wezterm, is much better than lots of other terminals. Thank you.

mostafaqanbaryan commented 2 years ago

@wez Beside cursor problem (as you are aware of it), there is something else as well. When I open a file in Vim that has RTL lines inside it, only lines that are visible has correct formatting. But RTL correction doesn't work for other lines in file that are not in view. You have to reload wezterm config (restart terminal in some way, or bring up those lines and open the file again) to correct it.

Example: When i open a file in vim, lines 1 to 30 are visible. but lines 30 to EOF that have RTL content, are like this: image And when using tail or cat, all the lines are like this too.

wez commented 2 years ago

I don't quite understand what you mean when you say "not in view". Can you expand on what you're trying and what you're seeing?

mostafaqanbaryan commented 2 years ago

I don't quite understand what you mean when you say "not in view". Can you expand on what you're trying and what you're seeing?

Yes, of course. I have a (test) file with 11 lines in it (You can generate persian content with this site). When I open vim, this is my terminal window: Screenshot from 2022-05-24 08-08-06 But when I go down to see other lines: Screenshot from 2022-05-24 08-08-40 Lines below the view (after line #6) are messed up.

Oddly enough, when I didn't use fullscreen terminal (Using ToggleFullScreen), this problem won't occure: Screenshot from 2022-05-24 08-10-30 So now I think when I'm in fullscreen mode, RTL rendering won't be triggered.

(And also, when you have a big file with only persian content in it, about 14 kb, terminal/vim gets really slow. But it's not important right now)

CIAvash commented 2 years ago

@wez It's not a problem for me, but something I observed; when a tab title contains RTL text, the text is not shaped and is not rendered as BiDi. But you probably already know that 'cause you probably did not apply BiDi rendering there.

mostafaqanbaryan commented 1 year ago

I think the last time, I couldn't fully understand the problem. The problem is, when I scroll in vim, terminal won't re-render and because of that, if some new text comes to visible part of screen, it would be messy. But the text that was already on screen, has no problem. If I use F11 and toggle fullscreen twice (go to fullscreen and back to floating mode), new visible texts would be fixed as well.

ninjalj commented 1 year ago

FYI, there is a new Unicode Working Group for Terminal Complex Script Support (TCSS). The initial proposal for the creation of the WG can be found at https://gist.github.com/XVilka/a0e49e1c65370ba11c17?permalink_comment_id=4615679#gistcomment-4615679

yarons commented 1 year ago

Hebrew looks great BTW, Conjoined RTLed alphabets are more complicated.

anonimo0-0 commented 1 year ago

Rendering RTL languages is pretty nice right now with bidi_enabled = true, however bidi_direction = "AutoLeftToRight" isn't that complete yet I think. I assume it defaults to LTR direction unless a character of an RTL language is detected before other characters? But it doesn't seem to be working, for example this from nano image I expected the second line after Lorem ipsum to have a right-to-left direction, but it didn't.

Thanks a lot for you work, dealing with bidi stuff must be a headache!

anonimo0-0 commented 1 year ago

To be clearer, here is an attached image of how mlterm does it: image When the line starts with a character that belongs to an RTL language, the line begins from the right side.

For those wondering why this matters, consider the following scenario of typing some words and pay attention to the order of how we typed the words:

Scenario 1

  1. start
  2. test
  3. نهاية
  4. الإختبار

Scenario 2

  1. نهاية
  2. الإختبار
  3. start
  4. test

If you look here, you will notice that wezterm renders these two lines in the same exact way, although they were typed in different order. First line is correct as it starts with English, first words inserted into nano in this case. Second line is wrong, as it should start with Arabic words first as they were typed before the English ones in this case. If the second line begins from the right side (as the case in mlterm above) this issue would be fixed. image

CIAvash commented 9 months ago

cosmic-term is using cosmic-text(which uses the rustybuzz, swash and unicode-bidi crates, ATM, I think) for its text shaping, rendering and RTL and bidirectional rendering support.

I don't know the details or how good it is, but thought it wouldn't hurt to mention it.

thisismygitrepo commented 6 months ago

With large language models able to parse 20+ human languages, I think the support is becoming more important than ever before. I read the thread and I couldn't really understand the solutions supported so far. I tried --config experimental_bidi=true @wez but that gave me an error saying its invalid config.

CIAvash commented 6 months ago

@thisismygitrepo try wezterm --config bidi_enabled=true

sajadspeed commented 3 months ago

I try --config bidi_enabled=true with Vazir Code Font and it displays correctly:

Screenshot_20240723_114719

But there are still some problems like doesn't work in vim/neovim: image

MoSal commented 3 months ago

But there are still some problems like doesn't work in vim/neovim:

:set noarabicshape
sajadspeed commented 3 months ago

:set noarabicshape

Yes it worked thank you.

There is only one more problem with the ZERO WIDTH NON-JOINER character with Unicode U+200C. In some places, like bash, when I press Shift+Space, it doesn't insert the character at all: image

But in zsh: image

In vim: image

And in neovim: image

I tried with every font and the problem was still there.

I don't think that the problem is exactly with the programs themselves, such as zsh or vim, because it behaves differently with the same font in Konsole.

It works fine in ‍‍bash‍ with Konsole: image

And in Vim: image