missing code point - Githubissues

whh1009 commented 4 years ago

When I use sfntly to extract a subset of fonts, some unicode code points can be obtained correctly, but some are not. I am a little confused, please help to take a look.


public static void main(String[] args) throws Exception{
        String codes = "\\u5e7e\\u8EAB\\ue85d\\ue85e\\u21deb\\u21df8\\u347e\\u347F";
        File srcFontFile = new File("D:\\wanghonghui\\Desktop\\mytest.ttf");
        File disFontFile = new File("D:\\wanghonghui\\Desktop\\test.ttf");
        getSubFont(codes, srcFontFile, disFontFile);
    }

public static void getSubFont(String ucodes, File srcFontFile, File disFontFile) throws Exception{
        long start = System.currentTimeMillis();
        Font font = FontUtils.getFonts(new FileInputStream(srcFontFile))[0];
        Set<Integer> glyphs = new LinkedHashSet<Integer>();
        CMapTable cMapTable = font.getTable(Tag.cmap);
        CMap cmap = cMapTable.cmap(Font.PlatformId.Windows.value(), Font.WindowsEncodingId.UnicodeUCS4.value());
        System.err.println(cmap);
        int glyphId = 0;
        for(String ucode : ucodes.split("\\\\u")) {
            if(StringUtils.isEmpty(ucode)) continue;
            glyphId = cmap.glyphId(Integer.parseInt(ucode, 16));
            if(glyphId != 0) {
                glyphs.add(glyphId);
            } else {
                System.err.println("code:"+ucode+"，not found");
            }
        }
        FontFactory fontFactory = FontFactory.getInstance();
        Subsetter subsetter = new RenumberingSubsetter(font, fontFactory);
        List<Integer> glyphList = new ArrayList<Integer>(glyphs);
        subsetter.setGlyphs(glyphList);
        Font newFont = subsetter.subset().build();
        FileOutputStream fos = new FileOutputStream(disFontFile);
        fontFactory.serializeFont(newFont, fos);
        long used = System.currentTimeMillis()-start;
        System.err.println("time: "+used+"ms");
    } ```

whh1009 commented 4 years ago

font.zip

this is font zip.

Are there any special requirements for fonts when using sfntly?

rillig commented 2 years ago

Hello whh,

String codes = "\u5e7e\u8EAB\ue85d\ue85e\u21deb\u21df8\u347e\u347F";

Some of the \u sequences look as if they contain 5 hex digits, for example \\u21df8. Did you really intend to include the code points U+21DF "DOWNWARDS ARROW WITH DOUBLE STROKE" and U+0038 "DIGIT EIGHT"?

whh1009 commented 2 years ago

Thank you very much your reply. \u21df8 is a unicode, which actually corresponds to a Chinese character, please see https://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=21df8&useutf8=true,there will be problems use 5 hex digits.

rillig commented 2 years ago

In Java, the character sequence \u21df8 is interpreted as U+21DF followed by U+0038. That's how it is, Java doesn't support \u with more than 4 hexadecimal digits. See JLS 17 sections 3.1 to 3.3.

If you encode your desired code points in UTF-16, this may already solve your problem.

Contrary to Java, Unicode allows 5 or 6 digits when referring to a code point such as U+21DF8. Keep this difference between Unicode and Java in mind.

rillig / sfntly

missing code point #14