hpjansson / chafa

πŸ“ΊπŸ—Ώ Terminal graphics for the 21st century.
https://hpjansson.org/chafa/
GNU Lesser General Public License v3.0
2.78k stars 58 forks source link

[Proposal] Optimal custom fonts using machine learning #20

Open cdluminate opened 5 years ago

cdluminate commented 5 years ago

The fonts used by us day to day are not specifically designed for printing images. In order to improve the result of character art, custom font is a doable solution, as proved by BE256 and BE512 fonts.

The highest resolution that chafa supports is 8x8 according to the bitmap. There are 2^64 possible combinations, but the whole Unicode table is not enough for that. However, my bold assumption is that most solutions in the space formed of 2^64 combinations are useless.

To find out the most useful N patterns in this space, we can take advantage from machine learning.

The proposed procedure for creating such custom font looks like this:

  1. sample M random crops (in ratio w=1:h=2) from an image dataset

  2. turn the M crops into binarized bitmaps using histogram and downsample to 8x8

  3. find N cluster centers in the space by leveraging Kmeans algorithm, with the M (64x1) binaryzed vectors as the dataset.

  4. convert the N vectors into C bitmap header and SVG plots.

Highlight:

  1. Best resolution.

  2. Easy to code. Automatic font and C code generation.

  3. I don't know what glyph is good for printing character art, but the algorithm can figure it out.

Assignee: myself

hpjansson commented 5 years ago

Sounds like an interesting project. We could definitely ship such a font with Chafa and support it with --symbols.

cdluminate commented 5 years ago

Now the goal of this issue turned into

  1. find the optimal set of glyphs
  2. documentation
  3. autotools: add option to enable this feature
cdluminate commented 5 years ago

Let's move forward a bit: update the font generator and really ship a (basically working) font file?

hpjansson commented 5 years ago

Let's move forward a bit: update the font generator and really ship a (basically working) font file?

Sounds good to me.

I don't like to keep generated blobs in git, but it's important to keep the build and installation process simple, so here's what I think we should do:

We may also have to keep the generated C source (also gzipped), but I'd prefer to load it dynamically, either from the JSON or the TTF. I'll have to see what's the best solution here, maybe we should be using a more compact format instead of JSON.

How does that sound?

cdluminate commented 5 years ago

Sounds good to me.

maybe we should be using a more compact format instead of JSON.

Any suggestion? I think JSON is just the format of best compatibility, even if not the most compact one.

Besides, we can add a configure flag called, e.g. --enable-kmeans-font which triggers the TTF file generation, and defines a C macro like CHAFA_HAS_KMEANS_FONT:

#ifdef CHAFA_HAS_KMEANS_FONT
#include "auto-generated.c"
#endif
hpjansson commented 5 years ago

Yeah, I was just thinking about the overhead of parsing about a megabyte of JSON on startup. However, I think we could go for a better solution where Chafa just loads the glyphs right out of the font file. That would be optimal, since we could adapt to any font and get better results in e.g. ascii mode too. It should be enough to link with FreeType or Harfbuzz, which are very common and low in the stack.

hpjansson commented 5 years ago

Chafa would then have a switch, e.g. --font-glyphs fontfile.ttf which would import glyphs from a font and map them to their respective Unicode code points. You could even specify it multiple times. Then there would be --symbols range where you could specify that you want to allow custom code point ranges, e.g. the one used in the k-means font. Using those two switches together you would get the desired result.

cdluminate commented 5 years ago

Cool! I like this idea.

hpjansson commented 4 years ago

It's in master now. Here's how to use it:

chafa --glyph-file chafa8x8.ttf --symbols 0x100000..0x101000

The font loading is a little bit slow, and I need to fine tune the bitmap generator, but I'm already getting improved output with e.g. chafa --glyph-file ter-x12n.pcf --symbols all where the font file corresponds to the Terminus font I'm using in the terminal.

cdluminate commented 4 years ago

Nice. Now I think the C code generation part can be safely removed from fontgen. Will submit a PR to overhaul fontgen when I got enough time to work on it.

clort81 commented 1 year ago

This looks fun. Grabbed a bunch of images and put them in ~/coco

$ ./chafa8x8.py CreateDataset --glob "coco/*.jpg"
Traceback (most recent call last):
  File "/media/sd/Projects/TermFun/chafedit/tools/fontgen/./chafa8x8.py", line 15, in <module>
    from sklearn.cluster import KMeans, MiniBatchKMeans
ModuleNotFoundError: No module named 'sklearn'

alright then

$ pip3 install sklearn
Requirement already satisfied: sklearn in /usr/local/lib/python3.10/dist-packages (0.0.post1)
$ pip3 install KMeans
Requirement already satisfied: KMeans in /usr/local/lib/python3.10/dist-packages (1.0.2)
$ pip3 install MiniBatchKMeans
ERROR: Could not find a version that satisfies the requirement MiniBatchKMeans (from versions: none)
ERROR: No matching distribution found for MiniBatchKMeans

Ah yes, python is satan. nevermind.

FWIW for my textart, braille already gives an 2x4 matrix that resolves everything quite well. Particularly if your font uses 'full block' braille glyphs.

Beyond that resolution there are few gains due to two-color limitation.

hpjansson commented 1 year ago

I think one could further develop this code to generate wedge shapes and such. It's been a while since I tried it, though. Maybe the dependencies are out of date (or the required packages are only available on Debian?).

cdluminate commented 1 year ago

I think one could further develop this code to generate wedge shapes and such. It's been a while since I tried it, though. Maybe the dependencies are out of date (or the required packages are only available on Debian?).

"It's been a while since I tried it" -- same for me. I thought I could rewrite the code in the past and did not find a good reason to do so due to the good sixel support from some modern terminals. But I still like the idea and it's fun. My code was using the standard libraries commonly seen in the machine learning community (scikit-learn). It's just a little bit tricky for someone not familiar with machine learning packages to discover that import sklearn in fact refers to the scikit-learn package: https://scikit-learn.org/stable/

Maybe I should write an requirements.txt file for dependencies?

hpjansson commented 1 year ago

Maybe I should write an requirements.txt file for dependencies?

Oh, that would be great! Or maybe expand its README.md a little?

clort81 commented 1 year ago

Thanks cdluminate! Installing scikit-learn as user worked.

I got this far

>  ./chafa8x8.py CreateDataset --glob ./coco/*.jpg

This gives a long file list to stderr but doesn't create a file.

./chafa8x8.py Clustering
=> loading dataset from chafa8x8.npz
Traceback (most recent call last):
  File "/media/sd/Projects/TermFun/chafa/tools/fontgen/./chafa8x8.py", line 232, in <module>
    eval(f'main{sys.argv[1]}')(sys.argv[2:])
  File "/media/sd/Projects/TermFun/chafa/tools/fontgen/./chafa8x8.py", line 95, in mainClustering
    dataset = np.load(ag.dataset)['dataset']
              ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/numpy/lib/npyio.py", line 405, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'chafa8x8.npz'

i see the python has a --save option but using that didn't create a .npz file how do i generate the .npz?

cdluminate commented 1 year ago

Note, don't let your shell expand the wildcard *.jpg. The correct command is as follows in your case

>  ./chafa8x8.py CreateDataset --glob './coco/*.jpg'

The output will look like this image

clort81 commented 1 year ago

1290508 Feb 5 03:07 chafa8x8.npz Worked! Sorry for the oversight. Thanks!

cdluminate commented 1 year ago

I'm currently running my code to see whether it can be updated. I'm also updating the README. You will be able to see the updates... maybe within the next 1 hour.

clort81 commented 1 year ago
$ ./chafa8x8.py GenA
 -> number of centers: 4633
=> Result saved to chafa8x8.json
Traceback (most recent call last):
  File "svg2ttf.py", line 4, in <module>
    import fontforge as ff
ImportError: No module named fontforge

Alright logical, you use some fontforge lib to generate the ttf...

$ pip3 install fontforge
Defaulting to user installation because normal site-packages is not writeable
ERROR: Could not find a version that satisfies the requirement fontforge (from versions: none)
ERROR: No matching distribution found for fontforge

Well now we're in hell again aren't we...

$ curl https://bootstrap.pypa.io/get-pip.py | python
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2514k  100 2514k    0     0  1249k      0  0:00:02  0:00:02 --:--:-- 1250k
Defaulting to user installation because normal site-packages is not writeable
Collecting pip
  Downloading pip-23.0-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 1.3 MB/s eta 0:00:00
Installing collected packages: pip
Successfully installed pip-23.0
$ which pip
/usr/local/bin/pip
$ pip install fontforge
Defaulting to user installation because normal site-packages is not writeable
ERROR: Could not find a version that satisfies the requirement fontforge (from versions: none)
ERROR: No matching distribution found for fontforge

Searching web for some solution i see a version can be specified

$ python3 -m pip install --pre --upgrade PACKAGE==VERSION.VERSION.VERSION
Defaulting to user installation because normal site-packages is not writeable
ERROR: Could not find a version that satisfies the requirement PACKAGE==VERSION.VERSION.VERSION (from versions: 0.1.1)

Oh is version 0.1.1 the right one?

$ python3 -m pip install --pre --upgrade PACKAGE==0.1.1
Defaulting to user installation because normal site-packages is not writeable
Collecting PACKAGE==0.1.1
  Downloading package-0.1.1.tar.gz (13 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  Γ— python setup.py egg_info did not run successfully.
  β”‚ exit code: 1
  ╰─> [7 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-0tfmb0j0/package_bbae3602879f4652831549991f005883/setup.py", line 4
          print """
          ^^^^^^^^^
      SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)?
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

Γ— Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

That smells like it's using python 2 for some reason yes?

What fontforge are you using and how do i install it?

What python version does it need? I'm at 3.11

Cheers

cdluminate commented 1 year ago

Please take a look at my latest pull request https://github.com/hpjansson/chafa/pull/128. Specifically:

Note, in order to generate a usable font, the python-fontforge (for older Debian systems) or the python3-fontforge (for Debian bullseye and newer) package has to be installed as well. It will be used in the ./chafa8x8.py GenA step. It will automatically invoke chafa8x8.py GenFont subcommand for creating the font.

I knew that you will encounter issue with Python2 :-)

cdluminate commented 1 year ago

Gpu clustering is supported now: https://github.com/hpjansson/chafa/pull/128/commits/03dcc8e8de6c9f6ee199d46341d85db1a754330f

We are able to have fun with large datasets as long as an Nvidia GPU is available.

cdluminate commented 1 year ago

I generated a sample font from a small dataset: chafa8x8.zip The file includes the json file and the ttf file.

It contains 4791 glyphs. The unicode range is 0x100000..0x1012b6. But when I try the following command, the outputs are fully black. What did I miss? (I installed the font)

chafa xxx.png --glyph-file /tmp/chafa8x8.ttf --symbols 0x100000..0x1012b6
cdluminate commented 1 year ago

I generated another sample font from a large dataset (COCO 2017 validation set): http://images.cocodataset.org/zips/val2017.zip

Steps to reproduce (Debian bullseye):

  1. ./chafa8x8.py CreateDataset --glob 'val2017/*.jpg' -Mc 500 . The resulting dataset size is 2.5 million.
  2. ./chafa8x8.py Clustering -B faiss . This takes 120.21 seconds on Nvidia RTX 2060 (mobile). It could take more than 12 hours with the sklearn backend on Xeon CPU, IIRC.
  3. ./chafa8x8.py GenA

Result: chafa8x8-coco2017val.zip

cdluminate commented 1 year ago

@hpjansson Is there any detailed instructions on how to use a custom font? (maybe the manpage description for --glyph-file should be expanded a little bit) I realized that I'm unable to make it work. I was able to use the font with the custom glyph header, but not with the --glyph-file argument. I must have missed something?

hpjansson commented 1 year ago

With current master, the easiest way is to use --glyph-file chafa8x8.ttf --symbols imported. But --symbols 0x100000..0x1012b6 should also work. Sometimes it takes a while for the display server to find the font, and some terminals have to be restarted (VTE will find the new font and update itself after a while).

hpjansson commented 1 year ago

There's also a hidden option you can use: chafa --dump-glyph-file chafa8x8.ttf will tell you what Chafa thinks the font looks like after internal postprocessing.

cdluminate commented 1 year ago

I'm using some VTE-based (tilix, gnome-terminal) terminals and QT-based terminals (konsole, yakuake). It seems that VTE-based terminals require the font to be installed into the system directory /usr/share/fonts/. Currently in my VTE terminals I can correctly see the glyphs during the dump chafa --dump-glyph-file. But when printing an image, the result is still fully black. I've patched the python code to remove the width=0 and vwidth=0 lines..

image

The 0x101079 is correctly shown... but the image is still not working correctly.

hpjansson commented 1 year ago

Strange. I was able to get the font picked up when copied into $HOME/.fonts/.

cdluminate commented 1 year ago

Meanwhile, the results of chafa --dump-glyph-file somehow differ from the chafa8x8.h (accurate).

In chafa8x8.h, the first several glyphs are:

{
    /* Chafa8x8 Font, ID: 1, Unicode: 0x100001 */
    CHAFA_SYMBOL_TAG_CUSTOM,
    0x100001,
    "        "
    "        "
    "        "
    "        "
    "        "
    "        "
    "        "
    "      X "
},
{
    /* Chafa8x8 Font, ID: 2, Unicode: 0x100002 */
    CHAFA_SYMBOL_TAG_CUSTOM,
    0x100002,
    "        "
    "        "
    "        "
    "        "
    "        "
    "        "
    "        "
    "     X  "
},
{
    /* Chafa8x8 Font, ID: 3, Unicode: 0x100003 */
    CHAFA_SYMBOL_TAG_CUSTOM,
    0x100003,
    "        "
    "        "
    "        "
    "        "
    "        "
    "        "
    "        "
    "    X   "
},

The positions of the three X are (8, 7), (8, 6) and (8, 5). Let's see the dump:

    {
        /* [􀀁] */
        CHAFA_SYMBOL_TAG_,
        0x100001,
        CHAFA_SYMBOL_OUTLINE_8X8 (
            "        "
            "        "
            "        "
            "        "
            "        "
            "        "
            "        "
            "   X    ")
    },
    {
        /* [􀀂] */
        CHAFA_SYMBOL_TAG_,
        0x100002,
        CHAFA_SYMBOL_OUTLINE_8X8 (
            "        "
            "        "
            "        "
            "        "
            "        "
            "        "
            "        "
            "   X    ")
    },
    {
        /* [􀀃] */
        CHAFA_SYMBOL_TAG_,
        0x100003,
        CHAFA_SYMBOL_OUTLINE_8X8 (
            "        "
            "        "
            "        "
            "        "
            "        "
            "        "
            "        "
            "   X    ")
    },

The positions are (8, 4), (8, 4), (8, 4).

cdluminate commented 1 year ago

The dump for the last several glyphs matches with chafa8x8.h.

hpjansson commented 1 year ago

The difference could be either due to the way to font is loaded from TTF, or the sharpening we do afterwards. Try applying this:

diff --git a/chafa/chafa-symbol-map.c b/chafa/chafa-symbol-map.c
index a0d6a97..01af859 100644
--- a/chafa/chafa-symbol-map.c
+++ b/chafa/chafa-symbol-map.c
@@ -289,7 +289,7 @@ glyph_to_bitmap (gint width, gint height,

     pixels_to_coverage (scaled_pixels, pixel_format, cov, CHAFA_SYMBOL_N_PIXELS);
     sharpen_coverage (cov, sharpened_cov, CHAFA_SYMBOL_WIDTH_PIXELS, CHAFA_SYMBOL_HEIGHT_PIXELS);
-    bitmap = coverage_to_bitmap (sharpened_cov, CHAFA_SYMBOL_WIDTH_PIXELS);
+    bitmap = coverage_to_bitmap (cov, CHAFA_SYMBOL_WIDTH_PIXELS);

     return bitmap;
 }

and see if it improves.

cdluminate commented 1 year ago

Thanks. I'll try the patch. I'm pasting the font file (without font.width=0 lines) here for convenience of testing on another computer. chafa8x8-coco2017val-nowidth0.zip

cdluminate commented 1 year ago

The difference could be either due to the way to font is loaded from TTF, or the sharpening we do afterwards. Try applying this:

diff --git a/chafa/chafa-symbol-map.c b/chafa/chafa-symbol-map.c
index a0d6a97..01af859 100644
--- a/chafa/chafa-symbol-map.c
+++ b/chafa/chafa-symbol-map.c
@@ -289,7 +289,7 @@ glyph_to_bitmap (gint width, gint height,

     pixels_to_coverage (scaled_pixels, pixel_format, cov, CHAFA_SYMBOL_N_PIXELS);
     sharpen_coverage (cov, sharpened_cov, CHAFA_SYMBOL_WIDTH_PIXELS, CHAFA_SYMBOL_HEIGHT_PIXELS);
-    bitmap = coverage_to_bitmap (sharpened_cov, CHAFA_SYMBOL_WIDTH_PIXELS);
+    bitmap = coverage_to_bitmap (cov, CHAFA_SYMBOL_WIDTH_PIXELS);

     return bitmap;
 }

and see if it improves.

This does not change the result.

hpjansson commented 1 year ago

Then it's probably due to Freetype rasterizing the TTF and not quite hitting the pixel grid as intended. It would probably work better with a bitmap font, or embedded bitmap strikes in the TTF.

cdluminate commented 1 year ago

image

VTE adds horizontal and vertical spaces between the chars ... which ruins the result.

cdluminate commented 1 year ago

PCF is more or less a binary version of BDF: http://fileformats.archiveteam.org/wiki/PCF While BDF format is plain text: https://en.wikipedia.org/wiki/Glyph_Bitmap_Distribution_Format , I can easily write a python exporter to export BDF format like the SVG glyphs. Is the BDF format supported? If not, I think I should be able to find sone bdf->pcf converter tool.

hpjansson commented 1 year ago

Looks like Freetype still supports BDF. But I think PCF/BDF font support was dropped in the Linux desktop stack recently, so you'd probably have to make BDF for chafa and still output TTF for the terminal.

VTE has cell spacing settings: In xfce4-terminal, they're on the Appearance preferences page. I think the default is 1.0 (no spacing), though. It's likely it's misunderstanding the font. I get only vertical gaps, not horizontal. But on mlterm it's the other way around :-)

hpjansson commented 1 year ago

I'd look for an existing font with connecting glyphs that display ok and see how it's doing it.

cdluminate commented 1 year ago

That's exactly the most difficult issue where I was stuck several years ago. I'm confused about how the terminals can correctly understand the standard block characters and let them fill the whole cell without any gap.

hpjansson commented 1 year ago

It's possible to run gucharmap to figure out which system font a particular glyph is coming from, e.g. a block symbol, by right-clicking on it. Then one can load the font in fontforge and look at the glyph metrics and such.

cdluminate commented 1 year ago

It's possible to run gucharmap to figure out which system font a particular glyph is coming from, e.g. a block symbol, by right-clicking on it. Then one can load the font in fontforge and look at the glyph metrics and such.

https://unix.stackexchange.com/questions/240521/how-do-i-find-which-font-provides-a-particular-unicode-glyph I found this

cdluminate commented 1 year ago

image

I tried to mitigate the vertical/horizontal gaps by tuning the parameters. The code is pushed to the chafa8x8-font branch.

The font for the above showcase is here: chafa8x8-hack.zip

The picture comes from Wikipedia: https://en.wikipedia.org/wiki/African_grey_hornbill

hpjansson commented 1 year ago

Whoa, not bad.

hpjansson commented 1 year ago

Feel free to merge to master when you're happy with it.

cdluminate commented 1 year ago

Merged to master. It's a temporary hack. We can improve it in the future.

hpjansson commented 1 year ago

By the way, if you want to go the bitmap font route at some point, I just remembered you can take PCF/BDF and embed it in an OpenType font quite easily: https://github.com/hpjansson/chafa/discussions/68#discussioncomment-1402327

So it would be possible to support in modern terminals.