smalot / pdfparser

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
GNU Lesser General Public License v3.0
2.41k stars 537 forks source link

False positive on "Secured pdf file" detection #743

Closed CedCannes closed 4 days ago

CedCannes commented 1 month ago

Description:

The parser is incorrectly identifying a non-secured PDF as secured, preventing it from being read. The PDF in question can be opened and read without any password or security measures using various PDF readers (Adobe Acrobat, Chrome PDF viewer, etc.). There are no apparent security features enabled on this PDF, suggesting this is a false positive in the library's security detection mechanism.

PDF input

The PDF file can be found at: https://www.lacameraembarquee.fr/img/cms/fiches-produit/DJINeo-manuel.pdf

Expected output & actual output

Expected output: The PDF content should be successfully extracted. Actual output: An exception is thrown with the message: "Secured pdf file are currently not supported."

Code

use Smalot\PdfParser\Parser;

        $parser = new Parser();
        try {
            $pdf = $parser->parseFile($pdfPath);
            $pdfText = $pdf->getText();
            ...
k00ni commented 1 month ago

Please try again with a custom config: https://github.com/smalot/pdfparser/blob/master/doc/CustomConfig.md#option-setignoreencryption

unixnut commented 1 week ago

It's very likely that this file is encrypted, but with an empty passphrase. Try the pdfinfo tool from the 'poppler-utils' (Debian package or https://pypi.org/project/poppler-utils/ ).

See #320 .

CedCannes commented 4 days ago

Yes, I have saved the file again and it works correctly