smalot / pdfparser

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
GNU Lesser General Public License v3.0
2.42k stars 537 forks source link

get images #221

Open hesamsajadi opened 5 years ago

hesamsajadi commented 5 years ago

i use this in a DPF contain 2 images

foreach ($pdf->getPages() as $page){ foreach ($page->getXObjects() as $key => $value){ $value->getContent(); } }

is this : $value->getContent(); a base64 image ? how to load this image in browser

BackendDevops commented 5 years ago
$parser = new \Smalot\PdfParser\Parser(); 
$pdf = $parser->parseFile('/your/pdf/file');

$images = $pdf->getObjectsByType('XObject', 'Image');

foreach( $images as $image ) {
    echo '<img src="data:image/jpg;base64,'. base64_encode($image->getContent()) .'" />';
}

This is how i did :)

Mmoscz commented 5 years ago

$re = '/(<<\/Subtype\/Image.?>>stream\n)(.?(?= endstream))/sxJ';

$arquivo = 'file.pdf'; $path_parts = pathinfo($arquivo);

$str = file_get_contents($arquivo);

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

$i = 1; foreach ($matches as $imagem) { echo $path_parts['filename'] . $i . ".jpg - " . strlen($imagem[2]) . "
"; file_put_contents($path_parts['filename'] . "-" . $i . ".jpg", $imagem[2]); $i++; }