modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.
https://github.com/modesty/pdf2json
Other
1.98k stars 378 forks source link

complete pdf content into json format #140

Open SOftEngrAtta opened 7 years ago

SOftEngrAtta commented 7 years ago

I am using this module to get complete content from pdf in json format but only fields save in json format without content i am facing this issue , any one can help ?

{ "formImage": { "Transcoder": "pdf2json@1.1.7 [https://github.com/modesty/pdf2json]", "Agency": "", "Id": { "AgencyId": "", "Name": "", "MC": false, "Max": 1, "Parent": "" }, "Pages": [ { "Height": 49.5, "HLines": [], "VLines": [], "Fills": [ { "x": 0, "y": 0, "w": 0, "h": 0, "clr": 1 } ], "Texts": [], "Fields": [], "Boxsets": [] }, { "Height": 63, "HLines": [], "VLines": [], "Fills": [ { "x": 0, "y": 0, "w": 0, "h": 0, "clr": 1 } ], "Texts": [], "Fields": [], "Boxsets": [] } ], "Width": 38.25 } }

wanghaisheng commented 7 years ago

you can try pdftotext such as poppler-utils provided to deal with your input pdf, perhaps pdf file is just broken some way for pdf.js to extract