smalot / pdfparser

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
GNU Lesser General Public License v3.0
2.37k stars 538 forks source link

Absorb spaces after 'stream' declarations #642

Closed GreyWyvern closed 1 year ago

GreyWyvern commented 1 year ago

Type of pull request

About

When detecting the start of a stream, PdfParser currently expects the next character to be either a carriage-return (\r) or a newline (\n). If there is a space in between the stream and either the \r or the \n, it is not detected as a stream of data and is discarded.

Adjust the regexp in RawDataParser.php to absorb spaces after stream.

Resolves #641. Note that in the sample files provided by the original reporter of 641 there are remaining font decoding issues with the output that are outside the scope of this fix.

Checklist for code / configuration changes