modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.
https://github.com/modesty/pdf2json
Other
2.02k stars 377 forks source link

[Fix] TypeError: Cannot read properties of undefined (reading 'slice') at at Function.Util_normalizeRect [as normalizeRect] #298

Closed zzeleznick closed 1 year ago

zzeleznick commented 1 year ago

Description

This PR adds upstream fixes from pdf.js in https://github.com/mozilla/pdf.js/pull/14784 and https://github.com/mozilla/pdf.js/pull/6239, respectively, that fixes an issue I saw when trying to parse a PDF with malformed annotations. The key change was validating the rectangle was valid – and you can remove the improved intersection code if desired.

Example File:

Land-Sector-and-Removals-Guidance-Pilot-Testing-and-Review-Draft.pdf

Trace

TypeError: Cannot read properties of undefined (reading 'slice')
    at Function.Util_normalizeRect [as normalizeRect] (eval at <anonymous> (file:///Users/zeleznick/Dev/pdf2json/lib/pdf.js:66:1), <anonymous>:509:18)
    at new Annotation (eval at <anonymous> (file:///Users/zeleznick/Dev/pdf2json/lib/pdf.js:66:1), <anonymous>:3531:22)
    at Function.Annotation_fromRef [as fromRef] (eval at <anonymous> (file:///Users/zeleznick/Dev/pdf2json/lib/pdf.js:66:1), <anonymous>:3728:22)
    at Page.get annotations [as annotations] (eval at <anonymous> (file:///Users/zeleznick/Dev/pdf2json/lib/pdf.js:66:1), <anonymous>:4437:37)
    at LocalPdfManager_ensure [as ensure] (eval at <anonymous> (file:///Users/zeleznick/Dev/pdf2json/lib/pdf.js:66:1), <anonymous>:32520:22)
    at Page_getOperatorList [as getOperatorList] (eval at <anonymous> (file:///Users/zeleznick/Dev/pdf2json/lib/pdf.js:66:1), <anonymous>:4365:43)
    at Object.eval [as onResolve] (eval at <anonymous> (file:///Users/zeleznick/Dev/pdf2json/lib/pdf.js:66:1), <anonymous>:27414:14)
    at Object.runHandlers (eval at <anonymous> (file:///Users/zeleznick/Dev/pdf2json/lib/pdf.js:66:1), <anonymous>:855:35)
    at listOnTimeout (node:internal/timers:559:17)
    at processTimers (node:internal/timers:502:7)

{
  message: "Cannot read properties of undefined (reading 'slice')",
  stack: "TypeError: Cannot read properties of undefined (reading 'slice')\n" +
    '    at Function.Util_normalizeRect [as normalizeRect] (eval at <anonymous> (file:///Users/zeleznick/Dev/pdf2json/lib/pdf.js:66:1), <anonymous>:509:18)\n' +
    '    at new Annotation (eval at <anonymous> (file:///Users/zeleznick/Dev/pdf2json/lib/pdf.js:66:1), <anonymous>:3531:22)\n' +
    '    at Function.Annotation_fromRef [as fromRef] (eval at <anonymous> (file:///Users/zeleznick/Dev/pdf2json/lib/pdf.js:66:1), <anonymous>:3728:22)\n' +
    '    at Page.get annotations [as annotations] (eval at <anonymous> (file:///Users/zeleznick/Dev/pdf2json/lib/pdf.js:66:1), <anonymous>:4437:37)\n' +
    '    at LocalPdfManager_ensure [as ensure] (eval at <anonymous> (file:///Users/zeleznick/Dev/pdf2json/lib/pdf.js:66:1), <anonymous>:32520:22)\n' +
    '    at Page_getOperatorList [as getOperatorList] (eval at <anonymous> (file:///Users/zeleznick/Dev/pdf2json/lib/pdf.js:66:1), <anonymous>:4365:43)\n' +
    '    at Object.eval [as onResolve] (eval at <anonymous> (file:///Users/zeleznick/Dev/pdf2json/lib/pdf.js:66:1), <anonymous>:27414:14)\n' +
    '    at Object.runHandlers (eval at <anonymous> (file:///Users/zeleznick/Dev/pdf2json/lib/pdf.js:66:1), <anonymous>:855:35)\n' +
    '    at listOnTimeout (node:internal/timers:559:17)\n' +
    '    at processTimers (node:internal/timers:502:7)'
}
Error: Page 13: Cannot read properties of undefined (reading 'slice')
modesty commented 1 year ago

nice work, thanks