Closed electroly closed 3 years ago
This is a bug in Konsole's bi-directional text rendering support and there's nothing Turbo Vision can do about it. The problem goes away if you disable this feature from Settings
> Edit Current Profile
> Advanced
> Uncheck Bi-Directional text rendering
.
Change that setting and share the results, because there may still be something wrong with these apparently double-width characters.
It was checked, but unchecking it didn't seem to fix it for me. I unchecked it, hit OK, then closed and reopened Konsole just in case.
Disabling bi-directional text rendering at least improved the cursor movement issue for me. But in your case it looks that Konsole is additionally rendering certain characters as double-width when they are not, I don't know why.
Can you please run the following program and share the output?
#include <locale.h>
#include <wchar.h>
#include <stdio.h>
struct UChar { const char *mbc; wchar_t wc; };
const UChar chars[] =
{
{"a", L'a'},
{"❶", L'❶'},
{"⑪", L'⑪'},
{"🤡", L'🤡'},
};
int main()
{
setlocale(LC_ALL, "");
for (const auto &ch : chars)
printf("wcwidth(%s) = %d\n", ch.mbc, wcwidth(ch.wc));
}
I get the following:
wcwidth(a) = 1
wcwidth(❶) = 1
wcwidth(⑪) = 1
wcwidth(🤡) = 2
Actually, please share a screenshot of the result so that we can see whether wcwidth
returns a different result for you or Konsole is simply not respecting it.
Here's mine:
Here's what I get.
They look like 1.5-width to me. It's no longer aligned with the character grid after the ❶ character.
In my earlier screenshot of the character picker, I notice the scrollbar is in the correct spot and it cuts off the characters inside the list, rather than itself being shifted to the right. That makes me think that if we had a list of the affected characters, this could be worked around by always explicitly moving to the expected screen position after writing one of those characters. That would hopefully cut off the right side of the over-wide character and allow the rest of the line to be in the right spot.
Well, it's clear that there's something wrong with how Konsole chooses to render these characters. I suggest you try another terminal emulator (alacritty, gnome-terminal, kitty, or even xterm). If the problem persists in any of these, then the problem may lie in the font rendering libraries. Otherwise, it may be an issue unique to Konsole (which version are you using, BTW?).
In my earlier screenshot of the character picker, I notice the scrollbar is in the correct spot and it cuts off the characters inside the list, rather than itself being shifted to the right.
This suggests to me that Konsole is aware these characters are actually just one cell wide, but for some reason they are rendered wider than they should.
That makes me think that if we had a list of the affected characters, this could be worked around by always explicitly moving to the expected screen position after writing one of those characters. That would hopefully cut off the right side of the over-wide character and allow the rest of the line to be in the right spot.
I have never experienced this issue before, so it can be assumed that not all Konsole users suffer from it. Then, how would Turbo Vision detect whether this issue is happening or not? I don't think enabling this workaround unconditionally would be very comfortable.
Cheers.
This is Konsole version 17.12.3. Changing the font does fix it. The fonts preinstalled on Ubuntu, "DejaVu Sans Mono", "Courier 10 Pitch", "Nimbus Mono L", and "Noto Mono", all produce the over-wide characters. My favorite third party font (Iosevka) looks correct. I wonder if this is some kind of font fallback issue, maybe ❶ doesn't exist in any of the built-in mono fonts.
A workable solution for me is to simply omit these characters from the symbol picker, but they are handy-looking glyphs that I'd like to salvage if I can. Another workable solution for me is to ignore the problem and just let it be broken on Konsole. These characters look good in every other system and terminal combination I've tried.
Another workable solution for me is to ignore the problem and just let it be broken on Konsole. These characters look good in every other system and terminal combination I've tried.
That's what I would do. At the most, this issue can be documented somewhere so that in the unlikely case a user runs across it, they can fix it themselves.
Works for me. Thanks!
wcwidth() lies in a huge number of cases. The only reliable way to determine the actual character width is as it is done for Windows, by outputting the character and measuring the cursor offset. By the way, dividing a line into grapheme clusters is possible using the same method. An example in Python is here: https://github.com/elfmz/far2l/issues/2378#issuecomment-2336818193
Hi @unxed!
I'm afraid that solution is only feasible on Windows. In order to do that in a Unix terminal:
So, in my opinion, you would end up with a poor experience for both the user and the programmer.
Why not output chars outside the visible area, above it?
I haven't actually tried that. But I suspect that drawing outside the visible area only makes sense if there is a scrollback area. Turbo Vision uses the alternate screen buffer, which results in scrollback being disabled in most terminal emulators. At this point, I expect that the terminal won't allow the cursor to be moved out of bounds. So, it looks to me that such a strategy is likely not to work in many terminal emulators, and therefore it would not be portable. Besides the fact that it would tackle just the first of the problems I mentioned.
Could whose problems be solved using atomic updates proposal as described here: https://gitlab.com/gnachman/iterm2/-/wikis/synchronized-updates-spec
?
Synchronized updates allow you to avoid having the characters you are trying to measure shown on screen. But that just solves the first issue I mentioned. In addition, it introduces more steps to the process of measuring a character's width, so the performance may be even worse depending on how you use it.
As absurd as it may sound, I think the only way around the issues I mentioned is to tacke this issue the other way around, and have the client application tell the terminal whether the characters it is printing should be displayed as single- or double-width. This way the application's expectations would match the actual display, so there could be no screen garbling because of width mismatches. And since this would just require a one-way communication from the application to the terminal, it would avoid the performance penalty of having to wait for replies from the terminal, and it would also avoid any conflicts with user input.
That sounds pretty reasonable! Could you offer a draft standard? It could be implemented, for example, in far2l built-in terminal or maybe in kitty if the author is willing to do so.
By the way, since we are talking about Unicode support. Could you please tell me if grapheme clusters can have varying displayed widths depending on neighboring grapheme clusters? Or is the width of a grapheme cluster a fixed value? I haven't been able to figure this out yet, maybe you know? Thank you!
Hi @unxed. Regarding your question on grapheme clusters, I don't know much about these details of the Unicode specification, so I can't help you. All I know is that I cannot expect the average terminal emulator to be fully compliant with Unicode. Not just because some terminals never intended to be compliant in the first place, but also because the specification evolves over time and implementations may become outdated (e.g. the internet is full of different implementations of wcwidth
each of which adheres to a different version of Unicode).
A clear example of this was the behaviour of Kate's embedded terminal widget by the time I wrote the following comment: https://github.com/magiblot/tvision/issues/26#issuecomment-719964250
Does Turbo Vision need to delegate Unicode processing to a external library? Actually, it doesn't. Turbo Vision is not a text editing component. What it needs to know is how text is displayed on the terminal, and this is platform-dependent, while the Unicode standard is not. So it doesn't help me at all to know that "👨👩👧👦" is a grapheme cluster if the terminal will display it differently:
That's why I think that attempting to solve the issue of character widths by focusing on Unicode standard compliancy is not the best idea. For this to work, both the application and the terminal emulator should either implement these complex Unicode logics, or rely on third-party dependencies that implement such logics. Even if they did so, the Unicode support in them would inevitably be in risk of becoming outdated, unless both these programs and/or the systems where they would be running kept receiving updates.
Considering that one of the main points of text-based applications is portability (e.g. being able to run in a remote host), it seems to me that tackling this issue in this way would be senseless.
Having the client application ask the terminal the width of text is a possible solution, but it will only work performantly in specific scenarios with very low latency. It takes at least 1 write
operation and 1 read
operation to measure the width of one character; the time it will take you to complete the whole process is proportional to the latency of the connection between the client and the terminal and to the amount of characters you need to measure, and therefore this is clearly not viable in many cases.
A serious proposal for the solution I mentioned in my previous comment would require considering a lot of things into account, since this is not just about single characters.
For example, a client application may want to ensure that displaying "👨👩👧👦" (consisting of 5 Unicode codepoints) will occupy just two screen columns. The terminal may not know how to render this grapheme cluster properly (as in the previous example of Kate's embedded terminal), so inevitably these characters won't be displayed the way the application expected, but they should still occupy exactly two screen columns, since messing up the application's layout can be avoided.
For a standard proposal to be effective in solving this issue, it should provide clear hints for terminal emulator developers on how to handle this situation and many other ones. But I have never developed a terminal emulator and I am not familiar with font rendering, so I have no idea what it makes sense to ask the terminal emulator to do and what it doesn't.
Taking into account everything we've discussed, the only solution that comes to mind is to pass a set of rules (describing how to split a string into grapheme clusters and determine the width of these clusters) from the terminal to the application (or vice versa) at the app start. Because it doesn’t seem like the Unicode standard logic is now actively changing between versions, but just new characters are being added.
Currently, such rules are usually statically compiled into the application. If they are made dynamically loadable, this could solve the issue, although it would result in a slight delay when launching the application. As for terminal support, we could experiment with this in the built-in far2l terminal, and if we find a sustainable solution, we could propose it to other developers.
What do you think of this approach?
Here's another idea. Perhaps we could develop a protocol that allows the terminal and the application to agree on the highest Unicode standard version they both support and then operate using that version. If this protocol isn't supported, we could fall back to the current approach.
The point of my suggestion was that things should be made as simple as possible for both the client application and the terminal emulator.
Turbo Vision currently uses the system-provided wcwidth
function on Unix systems (except on the Linux console, which works differently). Thus, Turbo Vision cannot know what version of Unicode is being taken into consideration (if any, because the implementation of wcwidth
may be arbitrary in some systems), and the protocol you suggested in https://github.com/magiblot/tvision/issues/51#issuecomment-2360970698 for negotiating Unicode versions would not help. It could work if it was reasonable to expect the average text-based application to be fully aware of the Unicode version it's using when deciding the width of its characters, and then the terminal emulator should be up-to-date and support many different Unicode versions. I think this would be very difficult.
Similarly, I think that having the client application and the terminal emulator talk to each other about rules describing how to split a string into grapheme clusters and determine the width of these clusters does not sound much simpler. What would those rules be like? How much code would it take in the client application to support that?
When writing about my suggestion, I was thinking of something like this:
This wouldn't ensure that all grapheme clusters are rendered properly, but it would prevent the client application's layout from messing up.
In addition, the terminal may have to reply to some of the escape sequences so that the client knows this feature is supported.
But, as I said, maybe implementing this in a terminal emulator is very complex and unconvenient. I don't know.
The point of my suggestion was that things should be made as simple as possible for both the client application and the terminal emulator.
@elfmz can you please look into this? Can we support this experimental approach in far2l's VT?
But, as I said, maybe implementing this in a terminal emulator is very complex and unconvenient. I don't know.
I tried to implement an approach in which the application informs the terminal about the size of the grapheme cluster on a per-cluster basis. And the terminal simply has to fit the grapheme cluster into the required matrix of cells.
You can play with it using vtm built-in terminal (vtm -r term
) on Windows (X11 support is not implemented yet).
Example 1. Output 3x1 character.
pwsh
:
"👩👩👧👧`u{D0033}"
wsl/bash
:
printf "👩👩👧👧\UD0033\n"
Example 2. Output 6x2 character.
pwsh
:
"👩👩👧👧`u{D00C9}`n👩👩👧👧`u{D00F6}"
wsl/bash
:
printf "👩👩👧👧\UD00C9\n👩👩👧👧\UD00F6\n"
Output:
The explicitly specified codepoint (joining modifier) is taken from the Unicode codepoint range 0xD0000-0xD02A2 (not allocated yet range), the value of which is encoded by the "wh_xy" literal value enumeration:
If you dive deeper, you can get the following things with rotation, mirroring and halves:
I've updated the draft: Unicode Character Geometry Modifiers
Btw, iTerm2 has ESC sequence to specify Unicode version for characters with detection: https://iterm2.com/documentation-escape-codes.html#:~:text=Unicode%20Version
Unicode Version
iTerm2 by default uses Unicode 9's width tables. The user can opt to use Unicode 8's tables with a preference (for backward compatibility with older locale databases). Since not all apps will be updated at the same time, you can tell iTerm2 to use a particular set of width tables with:
OSC 1337 ; UnicodeVersion=[n] ST
Where [n] is 8 or 9
You can push the current value on a stack and pop it off to return to the previous value by setting n to push or pop. Optionally, you may affix a label after push by setting n to something like push mylabel. This attaches a label to that stack entry. When you pop the same label, entries will be popped until that one is found. Set n to pop mylabel to effect this. This is useful if a program crashes or an ssh session ends unexpectedly.
I'm trying to find a solution to this rendering issue in Konsole.
This is a
TEditor
example showing how the line formatting gets shifted around due to those ❺ symbols. Also, the title of this window is clipped (there's no close parenthesis). These symbols are rendered slightly wider than a single cell, leading to the rest of the printed line being shifted out of alignment. Other terminals show this symbol in a single terminal cell, but Konsole is painting the line with variable character widths with chaotic effects.Here's my evolution of the ASCII table from tvdemo. On the right you can see some extra-wide characters that mess up the whole line. ⑪-⑯ and ❽-❿ are clipped. When you click on one, it selects the "wrong" character due to the rendering discrepancy.
Open to suggestions on this. Konsole is the only terminal I've tested so far with this issue.