Closed ibecav closed 2 months ago
Looks like calling resolve()
on fields
fixes the problem.
Replace fields = resolve(pdf.doc.catalog["AcroForm"])["Fields"]
with
fields = resolve(resolve(pdf.doc.catalog["AcroForm"])["Fields"])
and it looks like it works. I think we could modify the example code to do this.
Thank you. I'll try this fix in a little bit. As to changing the example I'll leave that to your discretion I'm by no means an expert but my understanding is that PDFs can be fickle and as I noted your example does work on some PDFs as is.
Thank you, that does indeed seem to resolve the error.
Thanks @jeremybmerrill for the solution, and @ibecav for flagging. I've now updated the example code in the README.
great! I'm by no means an expert either -- all standards-compliant PDFs are alike, but all weird PDFs are weird in their own unique way -- but I do know that calling resolve()
at every opportunity seems to make problems disappear.
Describe the bug
As with several others I have encountered this error when using the module. For example #935. I encountered it using an exact copy of your example script for extracting form values here https://github.com/jsvine/pdfplumber?tab=readme-ov-file#extracting-form-values but with the example pdf I am enclosing.
Have you tried repairing the PDF?
Yes, the results were (I had to laugh because yes, it really is a pdf file and it certainly renders correctly on screen):
Code to reproduce the problem
As stated above a simple copy of one of your examples run against the example pdf.
PDF file
FWIW it's a fillable form pdf created by the CDC and saved locally after filling.
example.pdf
Expected behavior
I expected it to work the same way your example code does. The code does work on other pdf files that aren't of this type.
Actual behavior
Screenshots
I can't think of any that would be helpful but please inform if otherwise
Environment
Additional context
My apologies in advance if I forgot any details in this issue. I'm new to Python and your excellent module but have experience in other languages. My current hypothesis based on reading other issues is that there is something non standard about the pdf itself but I am hopeful there is a workaround.