modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.
https://github.com/modesty/pdf2json
Other
1.98k stars 378 forks source link

Parsing USPTO forms? #15

Closed chadkirby closed 10 years ago

chadkirby commented 10 years ago

The USPTO uses some kind of form, created by Adobe LiveCycle Designer, that can't be read in any PDF viewer except for Acrobat Reader, Acrobat Professional, and maybe other Adobe products. For example, see the ADS form.

I'm not even sure what format those forms are in, but pdf2json (like all other non-Adobe PDF viewers) doesn't see any data except for the standard message, "If this message is not eventually replaced by the proper contents of the document, your PDF viewer may not be able to display this type of document."

Is there any chance that pdf2json might be able to parse form data from such forms at some point in future?

Thanks for any input you may have. And thanks for a very useful utility!

modesty commented 10 years ago

The form created in Adobe LiveCycle Designer is in XFA format, even Acrobat can not edit it. Since Acrobat is fully capable to deal with AcroForm, and all the PDF forms in my project are in AcroForm, I'll leave the support for XFA form to the community.