Closed NewUserHa closed 9 months ago
Without actual code snipplet, it is difficult
to tell what you are trying to do. Freetype-py is a fairly simple ctypes wrapper around upstream, so you need to read upstream documentation. I believe you need to read about FT_Get_Name_Index
and see whether it does what you want. https://freetype.org/freetype2/docs/reference/ft2-information_retrieval.html#ft_get_name_index .
Btw, you need to use python bytearrays (b'name'
) as arguments to freetype routine taking c strings, in python 3.
I'll close this for now, as this should answer your question.
load_char
takes the character value (these days, for most people, the unicode value; see addendum below) as a number, mostly, I think.
Addendum: I shall not confuse you with localised encoding like big5, gbk and jis etc. Officially load_char really takes the character value in the current active/default encoding of the font. For most recent usage that's the unicode value, but really it is the current encoded value in the default/current encoding.
I checked that you can use FT_Get_Name_Index
in freetype-py, https://github.com/rougier/freetype-py/blob/83bf5d32cd296795bb790f4fa89fc85c78f50630/freetype/raw.py#L110 .
I think you just use it like glyph_id = FT_Get_Name_Index(face._FT_Face, b'myglyname')
. (The face._FT_Face
construct is just to get at the lower-level handle ). There might be more python way of doing this elsewhere in freetype-py I am not aware of, such as a face.name_index
routine, maybe.
Thanks for your replay, I'll try FT_Get_Name_Index
. (It didn't show up in auto-complete list)
I found that load_char
actually takes integer as input.
I just wondered why FontForge
can display the correct integer/position
while freetype can't, and both they did't have cmap file, and think it may be a bug.
Should I report that freetype can't read the correct positions of glyphs like FontForge as bug to upstream?
Fontforge, as a font editor, probably will try very hard to let you access/manipulate incomplete font structures. Freetype has a slightly higher expectation of font being valid / complete. Anyway, load_glyph
should always work, and you should be able to go through the whole from 0 to max, if desperate.
But if use load_glyph
as the final choice then it's hard to use the font to extract text.
however, the positions of glyph Fontforge
reported are all correct.
Luckily this font has glyph names and the pdf has a cidmap, but what if there's a font that doesn't.
I'm not familar with fonts so don't know if freetype can get the correct positions as well
If you are thinking of extracting text from pdf, I think it is a generally unsolvable problem - there needs to be some way of mapping glyph id to char code, be it cmap or cidmap.
Very old pdf's sometimes don't have this, so it is not possible. Somewhat more recent there is an implied but undocumented Identity
cidmap, which is just saying that the charcode is the same as the glyph id. (Ie the glyph id is basically the unicode value, or the localised encoding value, if you are dealing with a localised pdf).
yes, it is an unsolvable problem.
I'm trying to use OCR to map those custom char code to real characters.
the pdf I have has a cidmap that maps char codes to glyph names (like /37/G25). somehow, fontforge can display that too (like 37 (0x25) "G25") without any cidmap input.
You probably want to look at mupdf / mutool and pymupdf for that sort of thing. It has text extraction and OCR api's for pdf's.
thanks for the replay.
I checked those and mupdf, it seems that it doesn't have info about OCR in document. mutool, but it seems to OCR the entire pdf page. pymupdf, I found https://github.com/pymupdf/PyMuPDF-Utilities/blob/master/OCR/tesseract1.py, it ocr by line.
But, the pdf I have has codes, OCR by entire lines or pages has issues with brackets (square brackets, angle brackets).
@HinTak Finally, after discussing at freetype repo, I found the issue at the beginning actually is that the auto-loaded face.charmap
is garbage data and it needs to manually set the charmap(0) (the font has an adobe custom charmap and nums_charmap is 1).
so I think this probably is a bug and now replay again.
I would argue it isn't a bug - it is as I commented earlier, fontforge can cope with incomplete or work-in-progress fonts, freetype expects largely valid fonts. There is only so much inconsistency or brokenness it would try to cope; so rejecting/refusing to load a broken font, or broken part of a font, is a not a bug.
But "FT_Face->charmap
is zero-initialized before any action on Unicode is taken.", and I found face.charmap != face.charmaps[0]
These are not broken fonts. They simply lack Unicode charmap. FreeType presents them with FT_Face->charmap = NULL. The issue is that freetype-py presents them with garbage in face.charmap. That is not exactly how bindings are supposed to behave. Please zero-initialize face.charmap. That is all you need to do to fairly mimic FreeType.
Fontforge falls back on the FT_Face->charmaps[0], i.e. the whatever encoding. A sane program should force a user to make this choice explicitly, if he really means and cares about the encoding.
I doubt that. Anyway, if you want to look at that, the code to modified/ etc is probably around: https://github.com/rougier/freetype-py/blob/83bf5d32cd296795bb790f4fa89fc85c78f50630/freetype/__init__.py#L2042
Or this:
Freetype-py is just reading face->charmap
or face->charmaps
Can you check if both of these definitions are actually executed when the Python class is created?
FT_ALLOC here means that everything is zeroed initially. It cannot be non-zero invalid pointer.
there's the font QGNGZCFzBookMaker1.patch the extension is for bypassing github
Pull welcomed. If somebody (else) wants to work on it.
I am not a Python person but I see that FT_Charmap is not the same as FT_CharMap in FreeType. Therefore, instead of
family_name = property(lambda self: self._FT_Face.contents.family_name,
there is more complex processing
charmap = property( _get_charmap,
In other words, it is not a straight copy. Then
return Charmap( self._FT_Face.contents.charmap)
probably chokes on NULL.
self._FT_Face.contents.charmap
is a straight forward copy. It is ctypes' way of saying ..._FT_Face->charmap
in c.
Yes. But what Charmap:__init__
is doing?
It is a straightforward copy:
I see that but I do not see any NULL handling. Are you saying that NULL is copied and everything else in Charmap is ignored automatically? Python magic?
Note that _get_charmaps
does not have to handle NULL because num_charmaps
protects it. On the other hand, _get_charmap
has to handle NULL, which I do not see.
Shouldn't there be a None assignment when the input is NULL? I just read about it, but I am not an expert.
got a char say n = 236(0xec) using "FontForge", but it throw
FT_Exception: FT_Exception: (invalid argument)
. however, if n is 0 to count of all used slots, it can work (just likeload_glyph
).there're names of glyph, but it seems that freetype can't get glyph by its name.
how to fix this, or load_glyph by itsname?