CLI parses unicode values, but calling merge_ufos directly does not

aaronbell commented 9 months ago

When trying to use merge_ufos directly, I found that the codepoints option doesn't work when provided with a codepoint (like 0x0065). It appears that this is due to the unicode values being stored as integers in ufoLib2 versus hex values.

When using the CLI, there's code to parse the unicode values into integers. But in merge_ufos, there's no such code.

IMO, there should be a check in merge_ufos / subset_ufos that determines if the unicode values are in integer form, or hex (and if so parse them).

simoncozens commented 1 month ago

I don't understand this. 0x0065 is an integer in Python. What code were using for the merge_ufos call?

aaronbell commented 1 month ago

I'm just doing a direct call like:

            merge_ufos(
                currentFont,
                extensionFont,
                codepoints=glyphSet,
                layout_handling="closure",
                existing_handling="skip",
            )

Where glyphSet is defined using a whitelist file that contains hex values.

            glyphSet = []
            with open('sources/'+lang+"/whitelist.txt") as f:
                glyphSet = f.read().splitlines()

and whitelist.txt is a set of hex values:

0x003A
0x003B
0x003C
0x003D
0x003E
0x003F
0x0040
0x0041
0x0042
0x0043
0x0044
0x0045
0x0046
0x0047

And merge_ufos can't deal with that because it already assumes they are converted to integers by the CLI code:

    def parse_cp(cp):
        if (
            cp.startswith("U+")
            or cp.startswith("u+")
            or cp.startswith("0x")
            or cp.startswith("0X")
        ):
            return int(cp[2:], 16)
        return int(cp)

No such code exists downstream in __init__.py so if I bypass CLI.py and call merge_ufos directly, I have to do the conversion myself.

simoncozens commented 1 month ago

You do, and I'm happy with that. The merge_ufos documentation says:

codepoints: A list of Unicode codepoints as integers.

You passed a list of Unicode codepoints as strings. ufomerge is not to blame here. :-)

aaronbell commented 1 month ago

Fair enough. Though I wonder, then, why the CLI allows cp to be something else? Seems to me that the CLI codepoints should behave the same as merge_ufos codepoints.

simoncozens commented 1 month ago

Because command line utilities are intended for end users, and libraries are intended for Python developers. Different audiences.

googlefonts / ufomerge

CLI parses unicode values, but calling merge_ufos directly does not #19