modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.
https://github.com/modesty/pdf2json
Other
1.98k stars 378 forks source link

Crash in 0.4.5 - xmldom tagName error #11

Closed simoncheeseman closed 11 years ago

simoncheeseman commented 11 years ago

In the current version, the xmldom module throws a fatal error when parsing a PDF (grayscale table with text only). Previous versions didn't have this issue.

7 Sep 20:50:25 - PDFParser1 -  is about to load PDF file uploads\035361c35f424d6885574aae35eae88b
7 Sep 20:50:25 - PDFJSClass1 - About to load fieldInfo XML : uploads\035361c35f424d6885574aae35eae88b
element parse error: Error: invalid tagName:<
@#[line:4,col:1]
element parse error: Error: invalid tagName:
@#[line:4,col:2]
element parse error: Error: invalid tagName:<
@#[line:4,col:376]
element parse error: Error: invalid tagName:
@#[line:4,col:377]
element parse error: Error: invalid tagName:<
@#[line:7,col:1]
end tag name: Filter /FlateDecode /Length 1613 is not match the current start tagName:undefined
@#[line:7,col:1]

C:\app\node_modules\pdf2json\node_modules\xmldom\dom-parser.js:185
            throw error;
                  ^
end tag name: Filter /FlateDecode /Length 1613 is not match the current start tagName:undefined
modesty commented 11 years ago

fixed in v0.4.6.