klippa-app / go-pdfium

Easy to use PDF library using Go and PDFium
MIT License
196 stars 16 forks source link

Bitmap generation with alternative fonts #68

Closed nonchan7720 closed 1 year ago

nonchan7720 commented 1 year ago

Nice to meet you and thank you for making a good library.

How should we do if we want to shy an alternative font when the font in the PDF is not found? We are experiencing an event where characters are lost when generating a Bitmap from a PDF because the font is not found.

jerbob92 commented 1 year ago

pdfium looks in a few different folders for fonts (depending on your OS): https://pdfium.googlesource.com/pdfium/+/refs/heads/main/core/fxge/linux/fx_linux_impl.cpp#169 https://pdfium.googlesource.com/pdfium/+/refs/heads/main/core/fxge/apple/fx_apple_platform.cpp#148

You can also give extra paths in go-pdfium to look for fonts when initializing the library: https://github.com/klippa-app/go-pdfium/blob/main/pdfium.go#L10

Generally it should be enough to have a default set of fonts, but I did notice it might help to install msttcorefonts to get some Windows fonts because some PDF renderers do not embed fonts and expect the Windows fonts to always be available.

nonchan7720 commented 1 year ago

@jerbob92 Thank you for your reply. I am using Wasm, is it the same?

jerbob92 commented 1 year ago

WASM uses a linux-like build, so the font folders the same as the Linux build:

If you provide your own FSConfig in the initialization, make sure that the font folder is mounted into the available folders for pdfium.

nonchan7720 commented 1 year ago

Thank you very much!

How should alternative fonts be specified? 🙇‍♂️

jerbob92 commented 1 year ago

pdfium doesn't expose an easy method to control that, the only way to do it is by implementing fpdf_sysfontinfo.h, which is one of the few things that this library did not implement (yet), due to the complicated nature of those methods. Those methods basically allow you to build your own font mapper.

I might look into adding support for this later, but generally PDF's should not use fonts that aren't widely installed on machines as it will cause render issues, that's why PDF's embed fonts if they are not installed by default on machines.

If a font isn't embedded, it first tries to search in the specified folders for the requested font. Afaik pdfium tries to fall back to Arial when it can't find the requested font.

Could you try to check which font the PDF is trying to load?

nonchan7720 commented 1 year ago

I have to process PDFs that are given to me by another user, and in some cases the PDFs use commercial fonts, and I would like to use an alternative font.

jerbob92 commented 1 year ago

@nonchan7720 so you mean that these PDF's have the font embedded but you want to replace them anyway?

nonchan7720 commented 1 year ago

Yes, it is.

jerbob92 commented 1 year ago

That is not possible right now. It might be possible when fpdf_sysfontinfo.h is implemented, but I do not know for sure, it might be that pdfium will always load the embedded font when it's available.

I would have to look into it, will let you know!

nonchan7720 commented 1 year ago

I look forward to the day when we can get good information!

jerbob92 commented 1 year ago

I did some tests and got a nice starting point (in the CGO implementation) with the implementation of fpdf_sysfontinfo.h. However, I have sad news for you. Pdfium only calls the custom font mapper when it renders a font that is not embedded inside the PDF.

I will probably complete the implementation (for WASM too) while I'm at it, but it won't be useful for you I'm afraid. You could try to ask here for pdfium to add a feature to also use the custom font mapper for embedded fonts.

jerbob92 commented 1 year ago

I will close this issue for now.