py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.43k stars 1.42k forks source link

ROB: Fixing infinite loop in ArrayObject read_from_stream #2928

Closed jakep-allenai closed 3 weeks ago

jakep-allenai commented 3 weeks ago

Found a pdf in the wild where the stream ends abruptly when reading an ArrayObject. This was causing a big memory leak and infinite loop as pypdf kept reading the same record over and over again.

codecov[bot] commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 96.39%. Comparing base (9f647e6) to head (8ef7a58). Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2928 +/- ## ======================================= Coverage 96.39% 96.39% ======================================= Files 52 52 Lines 8728 8730 +2 Branches 1589 1590 +1 ======================================= + Hits 8413 8415 +2 Misses 186 186 Partials 129 129 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

stefan6419846 commented 3 weeks ago

Thanks for the PR. Do you own all the necessary copyrights for shipping this file? Otherwise, please upload it to a comment and download it on the fly instead (see existing tests).

jakep-allenai commented 3 weeks ago

arrayabruptending.pdf

Uploading as comment just to be on the safe side.