smalot / pdfparser

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
GNU Lesser General Public License v3.0
2.3k stars 534 forks source link

Prevent zero from being passed to array_chunk() #686

Closed GreyWyvern closed 3 months ago

GreyWyvern commented 3 months ago

Type of pull request

About

Passing a zero (0) value to the array_chunk() function causes an error, and in rare cases, a PDF XRef object may be added to a document with an "empty" /W [0 0 0] command. In RawDataParser.php this would cause the $rowlen variable to be set to zero and cause an error.

Add a simple check to return an empty array in this case. Resolves #679.

It is very difficult to create a unit test for this error as it requires a PDF to be generated with "empty" sections with the specific /W [0 0 0] command. As the error occurs at a point when the entire document is being considered, we can't just feed PdfParser some test PDF code that's a subsection of a full document. The sample PDF given by the reporter of issue #679 contained personal info that we cannot include in PdfParser and they did not know how to generate a similar file. It is hoped we can merge this PR without a unit test added, since it is, at it's core, just a check for a zero value which should have been in the code already. :)

Thanks to @KeanuTang for the initial analysis and provided code solution, with which I built this PR.

Checklist for code / configuration changes

In case you changed the code/configuration, please read each of the following checkboxes as they contain valuable information: