hpjansson / chafa

πŸ“ΊπŸ—Ώ Terminal graphics for the 21st century.
https://hpjansson.org/chafa/
GNU Lesser General Public License v3.0
2.97k stars 64 forks source link

Unicode 16.0 support #201

Closed hpjansson closed 2 weeks ago

hpjansson commented 6 months ago

We need to add support for new Unicode 16.0 legacy symbols, chiefly:

Large type builtins will probably require manual definitions, but those for octants can be generated at runtime. We need new tags CHAFA_SYMBOL_TAG_OCTANT/octant and CHAFA_SYMBOL_TAG_LARGETYPE/largetype.

We may want to extend the coverage of the legacy tag, but it may be wise to hold off on this until terminal/font support is more widespread.

It'd also be a good idea to test our total coverage with Cascadia Code and look for any obvious gaps.

PhMajerus commented 6 months ago

I don't think the Large Type Pieces make sense for your project, unless I missed some text rendering feature besides the image conversions. The large type are really designed to build large text and their weight and exact design may differ from one font to another, so I really wouldn't use them in a bitmap to ANSI/VT converter. Try the following if you want more details than in my Cascadia feature request, the following document is more complete: curl https://raw.githubusercontent.com/PhMajerus/Documents/main/HowTos/HowTo%20Large%20Type%20Pieces.txt (from a terminal using a font that supports large type pieces).

On the other hand, octants are definitely something you'll want to support in a bitmap to ANSI/VT converter. Here is a comparison of all the pseudo-pixels mosaics: image These are half-blocks, quadrants, sextants, octants, separated quadrants, separated sextants, and braille.

Another set of characters coming in Unicode 16.0 that you'll want to take advantage of and are predictable regardless of the font are the sedecimants and eights sets, they add some 4Γ—4 and 8Γ—8 patterns. They don't provide all the patterns possible, but adding them to improve the resolution would be great: image curl https://raw.githubusercontent.com/PhMajerus/Documents/main/CheatSheets/More%20blocks%20tables.txt

hpjansson commented 6 months ago

I don't think the Large Type Pieces make sense for your project, unless I missed some text rendering feature besides the image conversions. The large type are really designed to build large text and their weight and exact design may differ from one font to another, so I really wouldn't use them in a bitmap to ANSI/VT converter.

Well - Chafa is kind of two-pronged. On one hand (and by default) it does straight MSE minimization to code points that look similar across terminals and fonts. On the other, it also supports alternative symbol sets and escape sequences for more traditional/artistic flavors, e.g. ASCII, some CJK, and custom fonts. I'd like to push further in both directions.

There's a lot of interesting research on structural character art rendering - see https://github.com/hpjansson/chafa/issues/150 for examples and ideas. Some of those could benefit from a greater selection of "imperfect" connective glyphs.

Try the following if you want more details than in my Cascadia feature request, the following document is more complete:

On the other hand, octants are definitely something you'll want to support in a bitmap to ANSI/VT converter. Here is a comparison of all the pseudo-pixels mosaics:

Another set of characters coming in Unicode 16.0 that you'll want to take advantage of and are predictable regardless of the font are the sedecimants and eights sets, they add some 4Γ—4 and 8Γ—8 patterns. They don't provide all the patterns possible, but adding them to improve the resolution would be great:

Brilliant - definitely adding support for these!

PhMajerus commented 6 months ago

I've been thinking about your idea of using all possible characters by loading fonts and analyzing the glyphs. Did you already include color emojis in your renderers? This could work like those pieces of art creating a large picture using a patchwork of smaller pictures. Emojis could provide some shape and colors contributing to a larger image.

This example only uses hearts emojis for their colors as pseudo-pixels: image

It doesn't provide any benefit over VT colors in a terminal, but works in plain-text: πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ§‘πŸ§‘πŸ§‘ πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ§‘πŸ§‘ πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ§‘πŸ§‘ πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ–€πŸ–€πŸ–€πŸ’™πŸ’™πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ§‘ πŸ§‘πŸ§‘πŸ§‘πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ’™πŸ©΅πŸ’šπŸ©ΆπŸ’™πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ§‘ πŸ§‘πŸ§‘πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ’™πŸ©΅πŸ©΅πŸ’›πŸ’šπŸ©΅πŸ’™πŸ–€πŸ–€πŸ–€πŸ–€πŸ§‘ πŸ§‘πŸ§‘πŸ–€πŸ–€πŸ’œπŸ–€πŸ–€πŸ–€πŸ–€πŸ©΅πŸ©΅πŸ©΅πŸ©΅πŸ’™πŸ©ΆπŸ©΅πŸ’™πŸ–€πŸ€ŽπŸ§‘ πŸ§‘πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ’™πŸ©΅πŸ©΅πŸ’œπŸ’œπŸ©·πŸ©΅πŸ’™πŸ§‘πŸ§‘ πŸ§‘πŸ–€πŸ–€πŸ–€πŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ–€πŸ–€πŸ’™πŸ’œπŸ’™πŸ’œπŸ©ΆπŸ©΅πŸ§‘πŸ§‘ πŸ§‘πŸ’œπŸ–€πŸ–€πŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ©·πŸ©·πŸ–€πŸ–€πŸ–€πŸ’œπŸ©΅πŸ©΅πŸ§‘πŸ§‘ πŸ€ŽπŸ’œπŸ’œπŸ–€πŸ’œπŸ’œπŸ’œπŸ©·πŸ’œπŸ©·πŸ©·πŸ©·πŸ’œπŸ’œπŸ©·πŸ’œπŸ–€πŸ’™πŸ§‘πŸ§‘ πŸ–€πŸ©·πŸ–€πŸ’œπŸ’œπŸ’œπŸ’œπŸ©·πŸ’œπŸ©·πŸ©·πŸ©·πŸ–€πŸ–€πŸ©·πŸ©·πŸ©ΆπŸ§‘πŸ§‘πŸ§‘ πŸ€ŽπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ©·πŸ©·πŸ©·πŸ©·πŸ©·πŸ©·πŸ©·πŸ§‘πŸ§‘πŸ§‘ πŸ§‘πŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ©·πŸ©·πŸ’œπŸ–€πŸ©·πŸ©·πŸ§‘πŸ§‘πŸ§‘ πŸ§‘πŸ–€πŸ’œπŸ–€πŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ–€πŸ©·πŸ©·πŸ©·πŸ©·πŸ©·πŸ§‘πŸ§‘πŸ§‘ πŸ§‘πŸ€ŽπŸ–€πŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ–€πŸ–€πŸ’œπŸ–€πŸ’œπŸ©·πŸ©·πŸ§‘πŸ§‘πŸ§‘ πŸ§‘πŸ§‘πŸ–€πŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ’œπŸ©ΆπŸ§‘πŸ§‘πŸ§‘ πŸ§‘πŸ§‘πŸ€ŽπŸ©·πŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ–€πŸ–€πŸ–€πŸ–€πŸ–€πŸ€ŽπŸ§‘πŸ§‘πŸ§‘πŸ§‘ πŸ§‘πŸ§‘πŸ§‘πŸ©·πŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ–€πŸ–€πŸ–€πŸ–€πŸ€ŽπŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘ πŸ§‘πŸ§‘πŸ–€πŸ©·πŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ’œπŸ–€πŸ’œπŸ’œπŸ’œπŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘πŸ§‘

You could achieve something more detailed by using all the emojis patterns and colors to do this at a higher resolution, and it would still work in plain-text.

hpjansson commented 6 months ago

Yes - I kept the door open to this in the API, so when implemented, you will be able to add multicolor glyphs while remaining backwards compatible.

See chafa_symbol_map_add_glyph() - it takes a number of pixel formats, although currently it's rendered to mono bitmaps internally. The internals will need some work.

oshaboy commented 2 months ago

@PhMajerus I've tried doing that before. The main problem is different emoji fonts use slightly different colors for the emoji. So it's almost impossible to get a consistent shade. Also Emoji hearts get the terminal really confused because the red one is traditionally half width while the rest are traditionally full width. Though this isn't a problem with modern font rendering.

Though I guess the same problem exists with the 8 and 16 bit colors as shown here https://en.wikipedia.org/wiki/ANSI_escape_code#3-bit_and_4-bit. Still at least the colors are way closer in ANSI then they are with emoji.

PhMajerus commented 2 months ago

@PhMajerus I've tried doing that before. The main problem is different emoji fonts use slightly different colors for the emoji. So it's almost impossible to get a consistent shade.

Of course images using colored hearts will not be exact, but a red stays a red and a yellow stays a yellow. It still provides some color information: image (A 125Γ—125 colored hearts image)

Of course, that is at the expense of resolution, as we can show 4Γ—4 pseudo-pixels for each colored heart if we use octants: image (A 256Γ—125 octants image)

Though I guess the same problem exists with the 8 and 16 bit colors as shown here https://en.wikipedia.org/wiki/ANSI_escape_code#3-bit_and_4-bit. Still at least the colors are way closer in ANSI then they are with emoji.

I don't know of a 16-bit color ANSI, AFAIK there are the 16 base colors (4-bit), 256 colors (8-bit), and RGB (24-bit). Note the 24-bit should be reliable, 8-bit slightly less, because although the 6x6x6 colors cube and grayscale are supposed to be consistent, they can differ by terminal or be modified by users or other apps. The ANSI 16 colors palette I'd argue is worse than the colored hearts, because they differ between the standard DOS colors and the Windows legacy console (used for about 30 years), and even between two CGA systems depending on the attached RGBI monitor (see the whole dark yellow vs brown/ochre issue):

image

hpjansson commented 2 months ago

The ANSI 16 colors palette I'd argue is worse than the colored hearts, because they differ between the standard DOS colors and the Windows legacy console (used for about 30 years), and even between two CGA systems depending on the attached RGBI monitor (see the whole dark yellow vs brown/ochre issue):

Even worse - many TEs have configurable presets for these. A common one on Linux is Tango (GNOME default):

term-pal-tango

And here's Solarized (dark):

term-pal-solarized-dark

oshaboy commented 2 months ago

At least 16 color ANSI has an ad-hoc standard that has the 16 colors specified. Most people who do chafa style stuff won't have their terminal set to a funky color scheme.

Meanwhile with emoji a quick glance at emojipedia will tell you how unspecified emoji actually are. Especially green, blue and purple.

This issue is somewhat solvable by either targeting a specific font, having the font selectable and having a table of all different colors or create an emoji font with well specified colors specifically for those purposes. But this feels beyond the scope of chafa. The way I solved it was to first clamp all the colors to 3 level rgb and then use a lookup table to approximate the right color. This is far from an ideal or even good solution.

hpjansson commented 2 months ago

I think the point is that color matching for 16-color ANSI and emojis (if/when implemented) are both approximate. Yes - emoji (and other colored glyph) output would target a specific font, cf. --glyph-file, potentially with built-ins for emojis that are similar between many fonts (I think hearts could qualify, but I wouldn't mind looking at counterexamples of common fonts where their representations are wildly different).

I hope everyone is having a great day :-)

acxz commented 3 weeks ago

I want to share @mafik 's ansi-art as it uses the font to generate "24-bit, Unicode-capable" for terminal output. See this reddit post: https://www.reddit.com/r/unixporn/comments/wgpxu3/oc_ive_been_working_on_extending_ansiart_with/

and mafik's website where he has an interactive version hosted: https://mrogalski.eu/ansi-art/ He uses JuliaMono due to the font's large support of unicode characters (the largest that I'm aware of)

It is the highest quality terminal art I've seen (excluding sixels)

hpjansson commented 3 weeks ago

That's pretty sweet! At a glance, we use the same algorithm at -w 9; MSE exhaustive search. However, ansi-art stores the font glyphs with higher fidelity (15pt -> a bit more than 10x20 with 256 gray levels vs. our 8x8 bitmap). It may be able to capture more detail that way.

That said, I had a branch at one point where I experimented with 16x16 bitmaps, but didn't see enough of an improvement to justify the added complexity of multiple glyph resolutions -- and variable glyph resolution is a big performance hit, since iteration counts wouldn't be known at compile time anymore, and you couldn't fit the bitmaps into an exact multiple of CPU registers.

Maybe I'm wrong about the quality gap; I'd love to look at side-by-side comparisons using -w 9 --symbols all (preferably in a new issue).

hpjansson commented 2 weeks ago

I added support for octants in https://github.com/hpjansson/chafa/commit/c23d8bc8d49919a4c78a564298d74b3e62f3b3e6 . Haven't decided what to do about the rest yet.

Edit: Actually, I did decide to include the sedecimants and eights. Just gotta do it.

hpjansson commented 2 weeks ago

I went over the remaining block symbols just now and added those we were missing. Had to skip some of the sedecimants, because VTE (and thus likely other common TEs) is confused as to their halfwidth/fullwidth status.

See 2af2e04054eb25bacb013c27dd0b69a516b3969f and 52904441303f24d393aaefcf30f1838f467ec90d. I'll address multicolor glyphs in a separate issue. Thanks again!