Open HunterZ opened 2 years ago
I'm also experiencing this issue on Fedora, but this line in your issue is key:
I suspect this is because I am using custom OTF fonts that are installed in each OS.
Unfortunately it seems that exporting with custom fonts is finicky, as whenever I try to export with the Sans font family is gives the same The PDF export failed: ePdfError_UnsupportedFontFormat
.
However, when exporting with Arial
or any font in the Liberation font family
, it works! Hope this helps :)
I have some more information to share:
First, I tried converting all ODF fonts I'm using to TTF (via FontForge then a Python otf2ttf script) and replacing them in my OS. Unfortunately this didn't fix it, but I was able to narrow things down to two font families.
On a hunch, I used sed
to change one of the font names in XML from that of the font family to that of one of the specific weight variants (medium/semibold/bold) - and it worked.
The problem with this workaround is that gImageReader only lets you pick a font family from its GUI, and not a weight variant. Both of these font families have 6 variants: medium/semibold/bold weights, each with regular and italic slant variants.
gImageReader was able to work out the italic variant when I picked a specific weight via XML, but this means that I'll probably have to specify the bold weight via XML hacking whenever I want bold, or the regular weight when I want non-bold.
...or maybe I can use FontForge to rearrange the font family naming to a taxonomy that is hopefully better supported by gImageReader?
Another update:
I was able to solve it by using FontForge to rename the medium variants' PostScript Names as follows:
XYZ-Medium
=> XYZ
XYZ-MediumItalic
=> XYZ-Italic
Once I did this, exported, and reinstalled the fonts, gImageReader was able to use the family name to derive regular, italic, bold, and bold+italic variants via its own flags.
The takeaway here is that gImageReader apparently only supports fonts that have a variant whose PS Name has no dashed suffix, which it then uses to derive the corresponding -Italic
, -Bold
, and -BoldItalic
variant names. A font whose "base" variant is -Medium
and base italic variant is -MediumItalic
just doesn't work.
I suspect this is a limitation in PoDoFo.
Running into a number of issues trying to export the results of painstakingly fine-tuning the hOCR for a PDF.
First, attempting to export directly to PDF from gImageReader-gtk 3.3.1 under Debian, or from gImageReader-qt latest CI under Windows with the PoDoFp backend results in the following error:
I suspect this is because I am using custom OTF fonts that are installed in each OS.
Second, attempting to export from gImageReader-qt latest CI under Windows with the QPrinter backend results in the text getting chopped up and duplicated in weird ways. Compare the gImageReader hOCR tree for my first page with the object list from the exported PDF:
Third, exporting to ODT from gImageReader-gtk 3.3.1 under Debian (not tested under Windows) results in a couple of issues:
As things currently stand, I don't see any way to get a viable PDF out of gImageReader, even indirectly via ODT->PDF, because all of the export methods either fail outright, produce garbled output, and/or discard aspects of my painstakingly hand-aligned custom font text.