Weird sizes of icon depending on number of spaces

bersace commented 4 years ago

Describe the bug

The size of git icon from icons-in-terminal varies a lot depending on number of spaces from tiny to big.

To Reproduce Steps to reproduce the behavior:

Install kitty 0.17.4
Install icons-in-terminal
Type echo $'\uEDCE°\uEDCE °\uEDCE °\uEDCE °\uEDCE °'
This looks like :

Expected behavior I expect that a single space is enough to have full-size icon.

Screenshots

With --config NONE

Enviroment details OS: Debian GNU/Linux 10 (buster)

kitty 0.17.4 (3d32202b3a) created by Kovid Goyal
Linux gilwell.virt 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64
Debian GNU/Linux 10 \n \l
Loaded config files: /home/bersace/.config/kitty/kitty.conf
Running under: X11

Config options different from defaults:
active_tab_background   Color(red=59, green=59, blue=59)
active_tab_foreground   Color(red=222, green=222, blue=222)
background              Color(red=24, green=24, blue=24)
color0                  Color(red=37, green=37, blue=37)
color1                  Color(red=237, green=74, blue=70)
color10                 Color(red=131, green=199, blue=70)
color11                 Color(red=239, green=197, blue=65)
color12                 Color(red=79, green=156, blue=254)
color13                 Color(red=255, green=129, blue=202)
color14                 Color(red=86, green=216, blue=201)
color15                 Color(red=222, green=222, blue=222)
color2                  Color(red=112, green=180, blue=51)
color3                  Color(red=219, green=179, blue=45)
color4                  Color(red=54, green=138, blue=235)
color5                  Color(red=235, green=110, blue=183)
color6                  Color(red=63, green=197, blue=183)
color7                  Color(red=119, green=119, blue=119)
color8                  Color(red=59, green=59, blue=59)
color9                  Color(red=255, green=94, blue=86)
dim_opacity             0.625
disable_ligatures       2
font_family             JetBrains Mono
font_size               13.0
foreground              Color(red=185, green=185, blue=185)
inactive_tab_background Color(red=24, green=24, blue=24)
inactive_tab_foreground Color(red=119, green=119, blue=119)
selection_background    Color(red=59, green=59, blue=59)
selection_foreground    None
tab_bar_background      Color(red=24, green=24, blue=24)
update_check_interval   0.0

kovidgoyal commented 4 years ago

That is by design. Those icons are using PUA unicode characters. These have to render in a single cell (they have width 1 in the unicode standard). However, to accommodate the use of these code point for icons, kitty will use any trailing spaces to render them, if available.

bersace commented 4 years ago

@kovidgoyal thanks for the reply. Yes i know the case of extra space for icon, that make sens.

However, I don't understand why this specific icon needs up to 4 trailing spaces to have proper size instead of one trailing space. Is it a bug in font ? How does kitty changes icon size depending on number of spaces ? Regards.

kovidgoyal commented 4 years ago

kitty scales glyphs to fit in the available space. So if a glyph is wider than it is tall, it will be reduced in height so that its width fits in the available width. Similarly for height.

loveencounterflow commented 4 years ago

That is by design. Those icons are using PUA unicode characters. These have to render in a single cell (they have width 1 in the unicode standard).

This does not seem to be the case, see Unicode UAX#11, 6.1 Unassigned and Private-Use Characters:

All private-use characters are by default classified as Ambiguous, because their definition depends on context.

And about ambiguous PUA characters:

Private-use character codes and the replacement character have ambiguous width, because they may stand in for characters of any width.

This would necessitate a longer discussion because it is such a complicated topic; let me just say that my impression is that:

1) the Unicode rules should be followed where they unambiguously define a codepoint as either narrow or as wide.

2) in contexts like terminal emulators and text editors that rely on a fixed grid, it is probably advantageous to not follow the Unicode recommendations to adjust the width of a given codepoint according to whether it appears before, between, or after wide or narrow chracters; first, that would be surprising, second, it's hard to get right and will (in my experience) probably fail in a lot of edge cases; third, it considerably complicates the implementation without bringing any real benefits.

3) Instead, the decision whether to treat ambiguous and neutral codepoints as wide or narrow should be left to the user, with sensible defaults to ensure most users are satisfied with out-of-the-box behavior.

4) In the more general case, even narrow/wide is not general enough. One example, and please bear with me, is Cuneiform, which has some truly huge signs. Now we most often do not program in Cuneiform but then there are also triple-codepoint ligatures and U+FDFD ﷽‎ ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM. The latter is also not likely to appear in programming but my point is that I think if we drop the narrow/wide dichotomy and instead adopt the idea that each codepoint (or ligatured sequence of codepoints) can take up an integer number of adjacent unit character cells, we can actually both simplify the implementation and provide an improved user experience.

At least that's what I think right now. I have some additional thoughts on how to test for the amount of space a given character should take up and how to display the respective glyph.

kitty scales glyphs to fit in the available space.

Yes but I find this is the least attractive solution. Other possible treatments would be:

do not scale and align the glyph so its left edge coincides with the left edge of the character cell;
the same with the right edge;
the same but center the glyph;
apply a custom transform consisting of scaling and translation.

What I find particularly unsatisfactory is that the appearance of a glyph in a newly opened terminal may depends on whether the first occurrance of that codepoint happened to be in front of spaces or not. That is utterly surprising and brings a modicum of randomness into the picture that further complicates matters.

kovidgoyal commented 4 years ago

The part you forget is that terminal emulators dont exist in isolation. They are one half of a duo, the other half being the program running in the emulator. The program and the emulator have to agree on the width of characters, the only way to do that is to standardize the width. Making it user configurable is a total no-go. You can certainly contemplate user option to control how kitty renders symbols in the available space, but actually changing the number of cells a PUA character takes is simply not going to happen.

loveencounterflow commented 4 years ago

actually changing the number of cells a PUA character takes is simply not going to happen

Based on my experience on what Unicode UAX#11 state, this is exactly what should happen (OTOH I don't think I would want to implement UAX#11 fully in a terminal emulator, but that doesn't take away from this specific issue).

My whole point in this is that I believe there is a point of view which gives the user more control over visual appearances and make life easier for the developer and lead to more predictable results.

You don't have to believe me, just look at how many text editors treat PUA codepoints; my impression is that they do look at how wide a glyph is in whatever font turns out to be responsible for that codepoint and allot space accordingly.

FYI & FWIW two links from similar discussions (Kitty does get mentioned, too):

bersace commented 4 years ago

I'm maintaining a powerline prompt for bash, so i'm the other half of the duo.

Having inconsistent icon size is weird. I can't add a variable number of spaces depending on terminal in my prompt.

I expect icon size to be determinist. For example, all icons are render with same height, centered. If there is not enough leading spaces for the width of the symbol, I accept that kitty truncates it (is it possible ?) or something else, but not resizing. I think that resizing glyphs does not fit icon case.

Output in kitty :

Output with vte:

Vte looks more consistent. Python icons is smaller and git icons is bigger than kitty.

@kovidgoyal Can you point us to the rational behind resizing glyphs ? Would it be another solution for icons without trailing space ?

kovidgoyal commented 4 years ago

On Tue, May 26, 2020 at 04:50:36AM -0700, loveencounterflow wrote:

actually changing the number of cells a PUA character takes is simply not going to happen

Based on my experience on what Unicode UAX#11 state, this is exactly what should happen (OTOH I don't think I would want to implement UAX#11 fully in a terminal emulator, but that doesn't take away from this specific issue).

My whole point in this is that I believe there is a point of view which gives the user more control over visual appearances and make life easier for the developer and lead to more predictable results.

An editor, is not a terminal emulator.

Again, if character width is font dependent there is no way for programs to know what that width is. If you can address this sticking point, I am all ears, otherwise your proposal is a non-starter.

kovidgoyal commented 4 years ago

On Tue, May 26, 2020 at 05:23:50AM -0700, Étienne BERSAC wrote:

I'm maintaining a powerline prompt for bash, so i'm the other half of the duo.

Having inconsistent icon size is weird. I can't add a variable number of spaces depending on terminal in my prompt.

I dont follow. You dont need to add a variable number of spaces. You add the same number of spaces, for all terminals, that matches your symbols' aspect ratio.

@kovidgoyal Can you point us to the rational behind resizing glyphs ? Would it be another solution for icons without trailing space ?

Because I prefer seeing resized but whole symbols to chopped off ones. If you prefer seeing chopped off ones, feel free to send a PR to add an option for it.

bersace commented 4 years ago

@kovidgoyal in the screenshot of the description, the git icon does not fill the width of the trailing spaces.

Do you know what is wrong with git icon ? Does it actually have such blank space after ? Can you suggest me a tool to examinate font glyph ?

kovidgoyal commented 4 years ago

It depends on the aspect ratio. So if there is more width available than the height, the icon will be expanded to fill the height and use whatever part of the width corresponds to that height while preserving aspect ratio.

kovidgoyal commented 4 years ago

And to examine glyphs, I usually use fontforge.

loveencounterflow commented 4 years ago

An editor, is not a terminal emulator.

I see a commonality in these three applications:

terminal emulators,
text editors, and
general monospaced text representations (in browsers and so on)

all aim to represent text in a readable, sensible manner, based on each character taking up an integer number of grid cells, the single unit being used for e.g. Basic Latin, and two units e.g. for CJK chracters.

Again, if character width is font dependent there is no way for programs to know what that width is.

Yes, I agree (although a sufficiently smart program—but yes, it's true). But, do tell me: what programs do have to know that? And which of those do assume that all PUA characters will be rendered as single-width?

The entire reason we today have a (huge) Private Use Area in Unicode at all is East Asian encodings that used to reserve space for users and applications to store their own custom-made glyphs. Therefore, if anything, software should assume all PUA codepoints to be double-width.

I do agree there is a difficulty when you have a software that draws tabular gridlines with characters inbetween that may contain PUA codepoints. But frankly, that problem does not go away with downscaling glyphs so that they become illegible, nor can the solution be to force all software to add a space after each PUA.

Maybe there's another solution to this conundrum. What if out of the ~1 million unassigned Unicode codepoints we could pick and agree on a few custom control sequences to indicate 'make the next character narrow', 'make it wide', maybe even 'shift it' or 'scale it' or 'don't scale the next'? This would be similar to the ANSI sequences of old, and could be emitted, if need be, by a third program (a line of sed in the simplest case) in a pipeline? This way, Kitty could keep working the way it works now, most software would not show any differences, but users would be free to add control sequences for improved output. Not much different from the current advice to add an extra space character, really, but with a bit more of controllable behavior. Could even think of making that behavior configurable... what do you think?

kovidgoyal commented 4 years ago

On Tue, May 26, 2020 at 07:41:27AM -0700, loveencounterflow wrote:

An editor, is not a terminal emulator.

I see a commonality in these three applications:

terminal emulators,

text editors, and

general monospaced text representations (in browsers and so on)

all aim to represent text in a readable, sensible manner, based on each character taking up an integer number of grid cells, the single unit being used for e.g. Basic Latin, and two units e.g. for CJK chracters.

Sure there is a commonality but there is a critical difference, one of those two hosts a vast ecosystem of unrelated programs outputting to it that require knowledge of character widths.

Again, if character width is font dependent there is no way for programs to know what that width is.

Yes, I agree (although a sufficiently smart program—but yes, it's true). But, do tell me: what programs do have to know that? And which of those do assume that all PUA characters will be rendered as single-width?

Any program that wants to render lined up text, your favorite text editor, for example. In theory they can query the terminal emulator for it cursor position after printing out an ambiguous width character, builtin practice that is too slow/cumbersome and no one does it.

The entire reason we today have a (huge) Private Use Area in Unicode at all is East Asian encodings that used to reserve space for users and applications to store their own custom-made glyphs. Therefore, if anything, software should assume all PUA codepoints to be double-width.

I do agree there is a difficulty when you have a software that draws tabular gridlines with characters inbetween that may contain PUA codepoints. But frankly, that problem does not go away with downscaling glyphs so that they become illegible, nor can the solution be to force all software to add a space after each PUA.

Maybe there's another solution to this conundrum. What if out of the ~1 million unassigned Unicode codepoints we could pick and agree on a few custom control sequences to indicate 'make the next character narrow', 'make it wide', maybe even 'shift it' or 'scale it' or 'don't scale the next'? This would be similar to the ANSI sequences of old, and could be emitted, if need be, by a third program (a line of sed in the simplest case) in a pipeline? This way, Kitty could keep working the way it works now, most software would not show any differences, but users would be free to add control sequences for improved output. Not much different from the current advice to add an extra space character, really, but with a bit more of controllable behavior. Could even think of making that behavior configurable... what do you think?

Well feel free to come up with such a scheme and get it accepted by the world at large, I spent a few months tilting at the windmill of getting just terminal developers to agree to anything, rather than the wider world, and gave up. I sincerely wish you all the best.

Spaces are definitely not as good as dedicated markup/escape codes, but they have the extremely important practical advantage of not requiring buy in from anybody other than the person using them.

kovidgoyal / kitty

Weird sizes of icon depending on number of spaces #2672