pmaupin / pdfrw

pdfrw is a pure Python library that reads and writes PDFs
Other
1.86k stars 272 forks source link

Pdf Form labels/values #165

Open typhoon71 opened 5 years ago

typhoon71 commented 5 years ago

I'm trying to read the labels and values from some pdf fillable form. I used:

x =pdfrw.PdfReader(path)

This gave me a dict with a ['/Root']['/AcroForm']['/Fields'] structure inside, but I can't find the form values I need.

[ pdfminer gives a similar structure, but has a resolver that takes care of getting the labels/values out of that dict, but I couldn't find any for pdfrw ]

Using PyPDF2 I could do:

x = PyPDF2.PdfFileReader(path)
d = x.getFields() 

and I would get fields/values of the form.

Is this possible with pdfrw? I couldn't find anything in the examples so I'm asking it here. If it's possible it would be nice to have an example for this too. (please, thanks)

gbroiles commented 4 years ago

Try this:

import pdfrw

ANNOT_KEY = "/Annots"
ANNOT_FIELD_KEY = "/T"
ANNOT_VAL_KEY = "/V"
SUBTYPE_KEY = "/Subtype"
WIDGET_SUBTYPE_KEY = "/Widget"

PDF_NAME = "test.pdf"

template_pdf = pdfrw.PdfReader(PDF_NAME)
for page in range(0, len(template_pdf.pages)):
    annotations = template_pdf.pages[page][ANNOT_KEY]
    for annotation in annotations:
        if annotation[SUBTYPE_KEY] == WIDGET_SUBTYPE_KEY:
            if annotation[ANNOT_FIELD_KEY]:
                name = annotation[ANNOT_FIELD_KEY]
                print("{} ".format(name), end="")
                if annotation[ANNOT_VAL_KEY]:
                    value = annotation[ANNOT_VAL_KEY]
                    print("= {}".format(value))
                else:
                    print()