py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.05k stars 1.39k forks source link

ENH: Flatten PDF forms #232

Open OpenNingia opened 8 years ago

OpenNingia commented 8 years ago

pdftk provides the feature to embed the form fields' text in the pdf itself. This is very useful if you want to use an editable pdf as a template to be filled by code.

from the pdftk manual:

[ flatten ]
Use this option to merge an input PDF’s interactive form fields (and their data) with the PDF’s pages. Only one input PDF can be given. Sometimes used with the fill_form operation.

usage example:

    with open(source, 'rb') as source_fp:
        reader = PdfFileReader(source_fp)

        writer.appendPagesFromReader(
            reader, lambda x: writer.updatePageFormFieldValues(x, fields))

        with open(dest, 'wb') as output_fp:
            writer.write(output_fp, flatten_fields=True)
whitemice commented 8 years ago

+1 A way to flatten a form would be excellent. I would like to avoid having another dependency for my code, which uses PyPDF2. But shipping filled in forms around the interwebz creates problems with a variety of vendors and their [I assume not based on PyPDF2] software.

mertz3hack commented 8 years ago

It would be great if PyPDF2 had the ability to fill in forms and flatten them!

oscardssmith commented 8 years ago

I also would really appreciate this

mstamy2 commented 8 years ago

(In progress) We can accomplish this by setting Bit Position 1 of the field flags.

Ref: Table 8.70 of PDF 1,7 spec

OpenNingia commented 8 years ago

Setting a field read-only might be a way, however pdftk works differently; afaik it replaces each /Field instance with a simple text object. :confused:

mstamy2 commented 8 years ago

You're right, that's the better option. Should be able to implement that soon

nberrios commented 7 years ago

I agree. This would be totally awesome!

jamoham commented 7 years ago

Is there any update on this? I am looking to use an editable pdf as a template which will be filled by code.

kherrett commented 7 years ago

I'm with @jamoham on this... for the same exact use case.

zhiwehu commented 7 years ago

+1

Rob1080 commented 7 years ago

Any update on this?

BeGrimm commented 6 years ago

Can you flatten a file with PyPDF2 yet? I've not found anything on this being implemented.

DrLou commented 5 years ago

I do see some code to _flatten in the PdfFileReader, but not in the writer. Will someone be taking a swing at this?

Joshua-IRT commented 5 years ago

I have exactly the same scenario as mentioned by @jamoham, @kherrett and @zhiwehu above. Has there been any progress on either being able to flatten a PDF, or set the fields as read-only?

Joshua-IRT commented 5 years ago

Rough bit of code if anyone needs to set fields to read-only prior to an update to the module (assumes you imported the whole module as PyPDF2). Works in a similar fashion to the existing updatePageFormFieldValues() method.

class PDFModifier(PyPDF2.PdfFileWriter):
    '''Extends the PyPDF2.PdfFileWriter class and adds functionality missing
    from the PyPDF2 module.'''

    def updatePageFormFieldFlags(self, page, fields, or_existing=True):
        '''
        Update the form field values for a given page from a fields dictionary.
        Copy field flag values from fields to page.

        :param page: Page reference from PDF writer where the annotations
            and field data will be updated.
        :param fields: a Python dictionary of field names (/T) and flag
            values (/Ff); the flag value should be an unsigned 32-bit integer
            (i.e. a number between 0 and 4294967295)
        :param or_existing: if there are existing flags, OR them with the
            new values (default True)
        '''

        # Iterate through pages and update field flag
        for j in range(0, len(page['/Annots'])):
            writer_annot = page['/Annots'][j].getObject()
            for field in fields:
                if writer_annot.get('/T') == field:
                    if or_existing:
                        current_flags = writer_annot.get('/Ff')
                        if current_flags is not None:
                            fields[field] = int(bin(current_flags | fields[field]),2)

                    writer_annot.update({
                        PyPDF2.generic.NameObject("/Ff"): PyPDF2.generic.NumberObject(fields[field])
                    })
chickendiver commented 4 years ago

+1 for flattening, such as in pdftk!

techNoSavvy-debug commented 4 years ago

+1 for a method for flattening pdfs

paulzuradzki commented 2 years ago

@mstamy2 , @OpenNingia

One thing I noticed with the approach of flattening/making forms read-only by setting the field flag bit to 1: when I try to merge resulting PDFs, only the values from the first document make it to the merged file. I don't think this is expected behavior.

paulzuradzki commented 2 years ago

Cross-posting this useful recipe by @Redjumpman: https://github.com/mstamy2/PyPDF2/issues/506

Remember to update the form field name if you want to merge multiple documents made from the same template form. Else, the merged PDF result will have identical pages due to each document sharing the same field names.

pubpub-zz commented 1 year ago

PdfWriter.append() should provide you with capability to add pages with data fields.

Can you confirm that this issue can get closed?

pubpub-zz commented 1 year ago

without feed back I close this issue as fixed. Feel free to provides updates if yuo wan to reopen it.

rolisz commented 1 year ago

I don't think the original issue is closed: how do you make fields non-editable easily? The use case being taking a PDF with editable forms, filling out the forms and outputing a PDF with non-editable fields.

pubpub-zz commented 1 year ago

the read-only flag defined here in the Pdf 1.7 reference (page 676) image

therefore you have to set the flags. Below an example setting all the fields in readonly:

import pypdf
r = pypdf.PdfReader("input_form.pdf")
for f,v in r.get_fields().items():
  o=v.indirect_reference.get_object()   # this will provide access to the actual PDF dictionary 
  o[NameObject("/Ff")] = NumberObject( o.get("/Ff",0)|1)
w = pypdf.PdfWriter()
w.clone_document_from_reader(r)
w.write("output_form.pdf")
OpenNingia commented 1 year ago

What you are suggesting is not "flattening" thou. The output pdf will still present data fields (widgets) . Flattening as pdftk does is replacing the data field with text.

pubpub-zz commented 1 year ago

@OpenNingia Can you provide a non-flat PDF file and its flattened version for review?

OpenNingia commented 1 year ago

Multiple pdf merged and flattened: Ichiro Yasuhigo.pdf

One of the editable source: sheet_all.pdf

pubpub-zz commented 1 year ago

The flattening process is quite tough to compute (create XOBject with the good characteristics) modify the content to place them. I see personnally very limited advantage vs time to implement an for me the readonly alternative could be sufficient ; I will have no time to propose a PR. Any candidate ?

pubpub-zz commented 1 year ago

since we have now #1864, flattening should be quite simple

rohit11544 commented 9 months ago

Can someone please provide a simple code snippet here for flattening a pdf?

matsavage commented 3 months ago

I have subclassed the PdfWriter class to be able to flatten forms here, so it can be done.

Would you accept PR for this, and do you have any idea of the interface which would be best for implementation?

I think this would be the easiest option, or there could be something more advanced, where you pass a list to be flattened, but all is the default, but I wouldn’t want to go too far on this. https://gist.github.com/matsavage/a50d9c541957f276088c341cc84a9e7f

pubpub-zz commented 3 months ago

@matsavage your code seems to have some good idea your function should integrate PdfWriter. In order to ease you should fork pypdf and build a branch with your mods : this will ease its merging.

What you should try is to convert the global ["/AP"]["/N"] into an XForm (that way you will not worry about merging the resources, drawing and so on into the page) and just add in the main page content a cm operation to do the translation to the proper rectangle, call the new XForm with Do operator : this should fit with all type of widgets

matsavage commented 3 months ago

I only did things this way to see if the flattening could be done, to save the effort of setting up the development environment on my machine, this is more the template than the PR

Thanks for the advice, I’ll try and have a look at this some time

pubpub-zz commented 3 months ago

Looking forward 😊

nicholas-alonzo commented 1 month ago

Looking forward for this feature!

matsavage commented 1 month ago

Honestly I haven’t been able to look at this since May, feel free to have your own attempt at implementing it if it’s something you need.

pubpub-zz commented 1 month ago

At your marks.... get set ... go! 😉😄😄😄

matsavage commented 1 month ago

At your marks.... get set ... go! 😉😄😄😄

I think it’s the one everyone wants, but no one wants to do

nicholas-alonzo commented 1 month ago

Honestly I haven’t been able to look at this since May, feel free to have your own attempt at implementing it if it’s something you need.

Darn, I wouldn't even know where to start 🥴