tpisto / pdf-fill-form

Fill PDF forms and return either filled PDF or PDF created from rendered page images.
MIT License
227 stars 48 forks source link

How made fields not editable #74

Open raugaral opened 4 years ago

raugaral commented 4 years ago

At the first I want to say that this is the best library to fill pdf.

My current problem is wenn the pdf is full filled, the fields still be editable. There is some parameter to avoid that? (There is a list of all possible parameters?) I don't want convert the pdf to a img. Need a lot of time and than the text can not be selected....

tpisto commented 4 years ago

Hi! When I wrote this library my use case was solved by converting the PDF to the img. This library uses Poppler -library (https://gitlab.freedesktop.org/poppler/poppler/tree/master) and seems that 0.64.0 release includes some form read only setters. I would be happy to merge the pull request regarding this issue if someone had time to investigate and implement this.

Release 0.64.0
        core:
         * Workaround form field text not being drawn on broken files. Bug #103245
         * Add read only setter for form fields
         * Add support for Link Hide action
         * Add support for Next actions in Links
         * Fix parsing of Annot focus out actions
         * Fix PDFDoc::checkHeader() for PDFs smaller than 1 KiB. Bug #105674
         * Add const to several classes and members
         * gfile: Fix build on some platforms
         * Fix issues with on malformed documents. Bug #105972, #105969, #106059, #106061
         * Several small code improvements

        qt5:
         * Allow setting of Form visibility status
         * Allow setting of Form read only status
         * Add support for Link Hide action
         * Add support for Next actions in Links
         * ArthurOutputDev: Implement axialShadedFill
         * ArthurOutputDev: Implement drawImageMask. Bug #105531
         * ArthurOutputDev: Implement Type3 font support
florianbepunkt commented 4 years ago

@tpisto I had a look at this since I need this functionality in order for the text in flattened file remain searchable. The issue is: Setting a field to read only is not the same as flattening it. If a field is read only it is still a field, but you cannot edit it. If a field is flattened, it gets converted to a normal object (like every other text or image object in a pdf).

That being said: In myoppinion the best and easiest way to flatten a PDF is to print the non-flattened pdf to a new pdf. poppler utils provide a tool called pdftocairo: https://github.com/freedesktop/poppler/blob/master/utils/pdftocairo.cc

If you have it installed and call pdftocairo -pdf non-flattened.pdf flattened.pdf the file gets flattened as intened. Do you have any idea how to incorporate this into this lib? I'm have no real knowledge of C++ (although I managed to get the lib compile again in Node 13, see other thread). So any hints/ideas how to work this in would be appreciated.

EDIT: To further clarify. The idea is, to pipe the output of QBuffer *writePdfFields(const struct WriteFieldsParams &params, bool isBuffer) to the pdftocairo command before returning the buffer object to node. Do you have any idea how to do this in c++?

tpisto commented 4 years ago

Yes, indeed the current way to flatten PDF just by converting it to image is not optimal. I think your suggestion is good. Let’s check the poppler code in order to find the best solution for this. By quickly checking pdftocairo seems that they also provide the conversion to image. Let’s find out how they print to searchable PDF and utilize that code...

florianbepunkt commented 4 years ago

I'm just ironing out some nan warnings...during this I saw that your funtion to create a Img version is basically the same approach that pdftocairo uses. Insteaf of of an png surface they use a pdf surface: https://www.cairographics.org/manual/cairo-PDF-Surfaces.html#cairo-pdf-surface-create-for-stream

florianbepunkt commented 4 years ago

I looked into this some more over the weekend.

pdftocairo from poppler-utils uses the internal poppler api without the qt interface. Sadly the qt interface lacks some possibilities, especially regarding printing pages to cairo surfaces.

I tried to replicate the functionality seen in pdftocairo, but my c++ knowledge is simply to superficial. I had a lot of problems importing the relevant parts and make the lib compile, which made me give up in the end.

However the basic structure is: create a pdf doc from the buffer using the <poppler/PDFDoc.h> and <poppler/PDFDocFactory>. This gives you an object where you can iterate over the pages, and print each page to a cairo surface (or print the whole thing in one go). The internal PDFDoc object exposes methods displayPage and displayPages, which can be used to print the pdf to cairo. For usage see methods beginDocument, beginPage, renderPage, endPage, endDocument inside the cairotopdf.cc file. Once you strip out all the non-pdf stuff, the logic becomes clear.

As I said I have to give up on this as without a basic understanding of c++/compiling mechanism I'm flying blind here. For the meantime I created a gist that spawns a child process that uses poppler utils: https://gist.github.com/florianbepunkt/a28acf772b5afa9f0841903d8589cd31

However this is far from ideal in my opinion. @tpisto Please let me know if this is something you plan tackle on. I can help as much as I can. I'm sure there were some design considerations why the qt5 interface was chosen, but as far as I can see, all feature (and additional ones) could be done by using the underlying poppler api.

tpisto commented 4 years ago

@florianbepunkt Thank you for your good report. Let's investigate this more. The Cairo PDF print is indeed interesting idea cause we already use Cairo for image print...

florianbepunkt commented 4 years ago

Yes, but as far as I know it is not possible with the Qt api. So we would have to use the internal poppler api. I did not manage to do it – reasons outlined above. If you are able to put together a branch where a poppler/PDFDoc is constructed from the buffer object passed by node, I think I can pick it up from there.

The question is: If this works (it might not), the question is: Does it make sense to stick with the qt5 bindings at all in the long run?

Another approach (theoretically, if you want to stick with Qt5) would be: Figure a way to send the Qt5 Pdf document to a Qt5 printer that prints out a pdf. As stated above, if you flatten a pdf manually (in Acrobat or Preview, etc), the easiest way to go is to print the non-flattened pdf to a new pdf.

andreas-gruenbacher commented 4 years ago

Maybe the poppler library can be extended to export the internal functionality currently used by pdftocairo?