smalot / pdfparser

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
GNU Lesser General Public License v3.0
2.31k stars 536 forks source link

How to extract images from PDF? #218

Open philipnjuguna66 opened 5 years ago

philipnjuguna66 commented 5 years ago

How do i extract images from pdf also if the table contains a table extract the same with styles

BackendDevops commented 5 years ago
$parser = new \Smalot\PdfParser\Parser(); 
$pdf = $parser->parseFile('/your/pdf/file');

$images = $pdf->getObjectsByType('XObject', 'Image');

foreach( $images as $image ) {
    echo '<img src="data:image/jpg;base64,'. base64_encode($image->getContent()) .'" />';
}
rubenvanerk commented 4 years ago

@philipnjuguna66 is your question answered?

BochinDiaz28 commented 2 years ago

image is FlateDecode?? no save

k00ni commented 2 years ago

image is FlateDecode?? no save

What do you mean?

is your question answered?

I agree. If there is no feedback soon, I will close for now.

BochinDiaz28 commented 2 years ago

require '../composer/vendor/autoload.php'; use Smalot\PdfParser\Parser; use Smalot\PdfParser\XObject\Image; $parser = new Parser(); $pdf = $parser->parseFile('mipdf.pdf');

$i = 0; $xobjects = $pdf->getObjectsByType('XObject'); foreach ($xobjects as $xobject) { if ($xobject instanceof Image) { $content = $xobject->getContent(); if ('FlateDecode' === $xobject->getHeader()->getElements()['Filter' ]->getContent()) { $content = zlib_decode($content,); //no save here //ERROR CODE: HERE!! PLEASE NO DECODE IMAGEN! } file_put_contents("extraidas/". ++$i .".png", $content); } }

k00ni commented 2 years ago

$content = zlib_decode($content,); //no save here //ERROR CODE: HERE!! PLEASE NO DECODE IMAGEN!

I still don't understand what exactly the problem is. What do you mean with save and decode? Please describe it a bit and don't just paste some unformatted code, not helpful.

BochinDiaz28 commented 2 years ago

Okay! when I extract the images from a pdf, there are several that are not detected. Because they are inflated or encrypted, according to what I read I must pass them through a function like zlib_decode but this one, nor the other similar ones in php, returns the image to me, I always get an error code, I

uploaded the same pdf to some extraction pages and they return fine the image I need. Sorry not if I explain it well. I leave a reference link: https://stackoverflow.com/questions/59374914/how-to-extract-images-from-pdf-using-php

skverma618 commented 1 year ago

Okay! when I extract the images from a pdf, there are several that are not detected. Because they are inflated or encrypted, according to what I read I must pass them through a function like zlib_decode but this one, nor the other similar ones in php, returns the image to me, I always get an error code, I uploaded the same pdf to some extraction pages and they return fine the image I need. Sorry not if I explain it well. I leave a reference link: https://stackoverflow.com/questions/59374914/how-to-extract-images-from-pdf-using-php

Reference link is not working, plz provide some relatable resource for the same for better understanding

skverma618 commented 1 year ago

Undefined type 'Image'

above error is being shown for your code

sbhshoaib commented 11 months ago

$content = zlib_decode($content,); //no save here //ERROR CODE: HERE!! PLEASE NO DECODE IMAGEN!

I still don't understand what exactly the problem is. What do you mean with save and decode? Please describe it a bit and don't just paste some unformatted code, not helpful.

The problem is, When I try to extract images that are PNG formatted, they can't be extracted. PNG images that are FlateDecoded become corrupted and unreadable.