Open soupmagnet opened 7 months ago
Please try again with our latest version 2.8.0-RC2
Ive run into a issue with (latest version 2.8.0-RC2) and i was using this code:
$config = new \Smalot\PdfParser\Config();
$config->setFontSpaceLimit(-60);
$config->setRetainImageContent(false);
$config->setIgnoreEncryption(true);
// Memory limit to use when de-compressing files, in bytes
$config->setDecodeMemoryLimit(10240);
$parser = new \Smalot\PdfParser\Parser([], $config);
$PDF = $parser->parseFile($PDFfile);
$metaData = $PDF->getDetails();
die(json_encode($metaData, JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES));
expected result would be similar to this:
Code: 200 - {
"CreationDate": "2019-10-31T08:27:44+01:00",
"ModDate": "2019-12-10T07:07:05+01:00",
"Producer": "iText® 5.5.10 ©2000-2015 iText Group NV (****)",
"Pages": 3364, <--- notice this works
"xmp:createdate": "2019-10-31T08:27:44+01:00",
"xmp:modifydate": "2019-12-10T07:07:05+01:00",
"xmp:metadatadate": "2019-12-10T07:07:05+01:00",
"pdf:producer": "iText® 5.5.10 ©2000-2015 iText Group NV (***)",
"xmpmm:documentid": "uuid:5c870642-b206-4312-8c05-2646e3c946a0",
"xmpmm:instanceid": "uuid:729bb9a6-a048-4bcc-996d-d44ca9a5555c",
"dc:format": "application/pdf"
}
The bug iam getting with a bigger PDF (4546 pages) gives this result with that same php code above:
Code: 200 - {
"Pages": 189
}
pdf is: 387 MB (406 340 557 byte)
Thank you for confirming.
I ve got a sample PDFfile regarding similar issue, might be a "index" issue since this code works only to 9th page example code:
for ($x = 0; $x <= 16; $x++) {
$pgcontent = $PDF->getPages()[$x]->getText();
echo("PageNr:".$x."\r\n".$pgcontent);
}
die("Done");
this gives 500 server error even with try and except:
try
{
$PDFContent = $PDF->getText(16);
}
catch (\Exception $e)
{
die( "PDF Problem: " . $e->getMessage());
}
When looking inside the pdffile with FoxIT reader, it reacts likes there is a index issue around pages 8-9. Is it possible to send the pdffile and keeping it private ? :) (feel free to PM me and ask for the file)
Description:
Very recently started getting the following Fatal Error when trying to parse some PDF files...
PDF input
I would be willing to provide a copy of the PDF if I can do so privately.
Expected output & actual output
The expected output of my code is the contents of the PDF parsed into a string of text and ultimately saved to a variable, instead there is a fatal error on certain PDF files and I really can't tell why.
Code