maxpmaxp / pdfreader

Python API for PDF documents
MIT License
117 stars 27 forks source link

ISSUE-109: work around mismatching predictors and broken image streams #116

Closed maxpmaxp closed 7 months ago

maxpmaxp commented 7 months ago

work around mismatching predictors and broken image streams

A document in #109 has embedded images which are not displayed in Actobat, but there are objects for them. The images have too short data streams (comparatively to the image Size declared) and PNG predictors on some data rows different fromPredictor` on the image.

  1. Image parser doesn't blow up on mismatching predictors
  2. It makes an attempt to recover broken image streams naively - truncate data or append zeros.