Closed GreyWyvern closed 1 month ago
Converting this to a draft for now. @iGrog supplied another PDF that still had the issue, and in fixing it, I'm sure there is an edge case: if a (string)
contains the BI
keyword, and then ID
and EI
can be found further on in the document, the potential is there for a large chunk of the document to be ignored. Very small chance this happens, but it's there.
The internal content of the captured BI ... ID ... EI
needs to be checked to verify that it is indeed inline image content before allowing the replace. I'll work on this and update this PR when ready.
I really appreciate you taking the time!
So, the last thing left here that the code wouldn't cover is a proper inline image, that doesn't have a proper image-properties dictionary with a width and height. The code in this PR then skips over it, but the potential is there for such an inline image (probably very rare if it happens at all) to contain binary content that can potentially cause errors in the way PdfParser interprets the document stream. (Like unbalanced Q/q etc.)
We can:
BI ... ID ... EI
sequences outside of strings as "valid" inline images. This allows the possibility (miniscule?) of finding false-positive inline image sequences.I've no data to back it up, but I believe the second case, where formatContent()
finds a BI ... ID ... EI
sequence outside of a string, but in error, is a probably rarer than an inline image dictionary not containing a height and width. But then again I could be wrong!
Regardless, I would recommend keeping the dictionary check just in case. If it gets released and users find the array-access error again, then we can always remove it. In this case, this PR is ready to be taken out of draft status as-is.
Thank you very much @GreyWyvern
Type of pull request
About
formatContent()
now accounts for inline imageBI ... ID ... EI
commands in document streams. Resolves #691.Checklist for code / configuration changes
In case you changed the code/configuration, please read each of the following checkboxes as they contain valuable information:
fixes #1234
to outline that you are providing a fix for the issue#1234
.