openSUSE / fonts-config

7 stars 12 forks source link

feat: add Noto HK to fonts-config #24

Open Pi-Cla opened 2 years ago

Pi-Cla commented 2 years ago

@qiangzhao @marguerite @guoyunhe I trimmed down my PR so that it only prioritizes Noto HK in zh-HK. So zh-MO is left alone. Let me know what you all think.

marguerite commented 1 year ago

Hi, @Pi-Cla,

Like I said on bugzilla, you can go ahead playing with zh-MO. I even gave the fontconfig preference order of zh-MO there.

marguerite commented 1 year ago

Please:

  1. Restore zh-MO
  2. Two ways: 2.1 Always keep HK before TC except for TC itself(where you can skip the HK), because HK has fewer code points than TC, TC fully covers the code points used by HK. If TC is before HK, HK variant may never show. 2.2 use HK for SC, HK and MO only. Because eg for KR users they just want to display the Traditional Chinese, they don’t even care variants. We can save lots of fontconfig match work by giving only one with the largest code point coverage

I read the code point coverage stuff from Wikipedia. It says TC has 100551 glyphs while HK has only 5033 in version HKSCS-2016.

you can check the code points in the font via “fc-query ‘font-file’” and do the comparison yourself.

Pi-Cla commented 1 year ago

I have implemented your suggestions @marguerite , thanks for the feedback.

marguerite commented 1 year ago

I have to correct one thing, I used Wikipedia data instead of the font information and said Noto Sans TC covers more unicode code points than Noto Sans HK. that is wrong

I made a program based on the codes I implemented for fonts-config-ng

package main

import (
  charset "github.com/marguerite/fonts-config-ng/fc-charset"
  "fmt"
  "os"
)

func main() {
  data, _ := os.ReadFile("./hk.txt")
  data1, _ := os.ReadFile("./tc.txt")
  c := charset.NewCharset(string(data))
  c1 := charset.NewCharset(string(data1))
  fmt.Println(c.Substract(c1).String())
  fmt.Println(c1.Substract(c).String())
  fmt.Println(c.Count())
  fmt.Println(c1.Count())
}

And the result is:

5c83 f904 f908 f90c f934 f96d f97b f99d f9be f9d0 f9d9 f9e2-f9e3 30ede 3ada f931 f939 30edd 20755 20745

it means HK covers 20755 code points and TC covers 20745 code points

the only difference of the two is like this: