smalot / pdfparser

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
GNU Lesser General Public License v3.0
2.39k stars 535 forks source link

False positive on "Secured pdf file" detection #743

Open CedCannes opened 2 weeks ago

CedCannes commented 2 weeks ago

Description:

The parser is incorrectly identifying a non-secured PDF as secured, preventing it from being read. The PDF in question can be opened and read without any password or security measures using various PDF readers (Adobe Acrobat, Chrome PDF viewer, etc.). There are no apparent security features enabled on this PDF, suggesting this is a false positive in the library's security detection mechanism.

PDF input

The PDF file can be found at: https://www.lacameraembarquee.fr/img/cms/fiches-produit/DJINeo-manuel.pdf

Expected output & actual output

Expected output: The PDF content should be successfully extracted. Actual output: An exception is thrown with the message: "Secured pdf file are currently not supported."

Code

use Smalot\PdfParser\Parser;

        $parser = new Parser();
        try {
            $pdf = $parser->parseFile($pdfPath);
            $pdfText = $pdf->getText();
            ...
k00ni commented 2 weeks ago

Please try again with a custom config: https://github.com/smalot/pdfparser/blob/master/doc/CustomConfig.md#option-setignoreencryption