t-houssian / fillpdf

A python library to make filling pdfs much easier
MIT License
130 stars 23 forks source link

Most unicode characters added with write_fillable_pdf() disappear when flattening PDF #48

Open lymanjohnson opened 12 months ago

lymanjohnson commented 12 months ago

Characters that are represented in section D2 of PDF manual version 1.7 (starting on page 1001) work fine.

However, other unicode characters such as ■ (U+25A0) disappear when I flatten the page. These unicode characters do appear if I leave the PDF un-flattened. Flattening the PDF, however, causes them to disappear.

I've tried flattening the PDF in two separate ways:

  1. Using write_fillable_pdf() with kwarg flatten=True
  2. Using write_fillable_pdf() to create an intermediate pdf, and then flattening that using flatten_pdf()

In 1, special characters appear In 2, special characters do NOT appear.

Although option 2 is generally "better" in terms of rendering text consistently and correctly sized, it fails to render special characters.

Basically if the pdfrw.PdfString.to_bytes() value looks like "(\x81)" it will flatten correctly. However, if the pdfrw.PdfString.to_bytes() looks like "" it will vanish during the flattening process.

_EDIT: My question originally implied that 1 and 2 both failed to show special characters. In fact, it only fails when using flatten_pdf(as_image=True)_

HernandezCM commented 11 months ago

Hey I was having some printing issues on iOS regarding some of this, have you tried mypdf.flattenpdf(file_name, outfile_name, as_images=True) the as_images optional converts the pdf to an image and back into a pdf and this fixed the printing issue I was having and it might help your issue as well!

lymanjohnson commented 11 months ago

Oddly enough, that's the version that does NOT work for me.

I have three versions of the code set up:

  1. No flattening

fillpdfs.write_fillable_pdf(infile_path, outfile_path, field_dictionary, flatten=False)

  1. "Normal" Flattening:

fillpdfs.write_fillable_pdf(infile_path, outfile_path, field_dictionary, flatten=True)

  1. "as_images" flattening:
fillpdfs.write_fillable_pdf(infile_path, mid_file_path, field_dictionary)
with open(mid_file_path) as mid_file:
    mid_file.seek(0)
fillpdfs.flatten_pdf(mid_file_path, outfile_path, as_images=True)

The special symbols appear correctly in versions 1 and 2, but do NOT appear in version 3.

HernandezCM commented 11 months ago

Interesting, have you tried just reinstalling poppler-utils which is needed for the 3rd version to work correctly? Could be an issue with the CL dependency. For my issue your second version of flattening was causing an issue where text would not print correctly when being printed from an iOS device.

lymanjohnson commented 11 months ago

I can try, but the poppler flattening seems to be working in general, and even shows special characters so long as they're ones that appear in section D2 of the PDF manual. It only fails for other unicode characters.

Could I trouble you to try these symbols on your system to see if as_images flattens them correctly? √ ■

EthanC-8 commented 10 months ago

You are going to have to use the as_image method to make it work I had the same issue but using the poppler method resolved it..

Just make sure poppler is installed properly or send me your error log i can check for you..

# Install Poppler utils on Ubuntu/Debian-based systems
sudo apt-get update
sudo apt-get install -y poppler-utils

# Install Poppler utils on Fedora
sudo dnf install -y poppler-utils

# Install Poppler utils on macOS using Homebrew
brew install poppler

# Install Poppler utils on Windows
# Download the pre-built binaries from https://poppler.freedesktop.org/ and add the bin directory to your PATH
darlenepetal commented 1 week ago

figured this might be worth adding for any future people coming here with this problem:

i spent ages trying to fix this exact issue, and @EthanC-8's method didn't work... only to discover that the typeface being used to fill the pdfs was not compatible with certain special characters. when looking at the pdfs before they were flattened, i realized that the special characters were changed to a default typeface that was compatible with those characters. then, when flattening the pdf, those default-font characters would disappear.

so... don't be me. before assuming it's an issue with the library or your code, make sure the typeface actually includes those special characters, lol