Open codingWWW opened 5 years ago
Hey, I was going to submit a PR and go through the hoops regarding testing, but since it doesn't seem like PRs are actively monitored and since I'm not familiar with testing with Atoum, I'm not going to go through the trouble.
Anyways, this little function I wrote seems to do the trick. I needed this functionality for my own testing purposes.
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile($path);
$bookmarks = $this->extractBookmarks($pdf);
/**
* @param $pdf
* @return array
*/
private function extractBookmarks($pdf){
$bookmarks = [];
foreach ($pdf->getObjects() as $obj){
$details = $obj->getHeader()->getDetails();
if (isset($details['Title'])){
$bookmarks[] = $details['Title'];
}
}
return $bookmarks;
}
This will just return an array of the bookmark names. Hope this helps get you started.
Hi, but I don't see target pages for bookmarks in the Details array (
true, how can we get page numbers of bookmarks and page links? can someone reply?
Hey, I was going to submit a PR and go through the hoops regarding testing, but since it doesn't seem like PRs are actively monitored and since I'm not familiar with testing with Atoum, I'm not going to go through the trouble.
Anyways, this little function I wrote seems to do the trick. I needed this functionality for my own testing purposes.
$parser = new \Smalot\PdfParser\Parser(); $pdf = $parser->parseFile($path); $bookmarks = $this->extractBookmarks($pdf); /** * @param $pdf * @return array */ private function extractBookmarks($pdf){ $bookmarks = []; foreach ($pdf->getObjects() as $obj){ $details = $obj->getHeader()->getDetails(); if (isset($details['Title'])){ $bookmarks[] = $details['Title']; } } return $bookmarks; }
This will just return an array of the bookmark names. Hope this helps get you started.
Please help getting the page numbers of bookmarks and page links.
@rw152: if you are still willing to contribute, we introduced PHPUnit a while ago. Your function seems useful so as one of the maintainers I can assist you here.
@isavepak:
Please help getting the page numbers of bookmarks and page links.
If I remember correctly, it is currently not possible.
I too am in need of this, but it looks like the data is just not there. Question though. I used the above code successfully to grab the bookmark titles, but the array that is returned is not in the order the bookmarks appear on the page. It seems almost random. It's not even ABC, it's just random. Do we know of a way to get the bookmarks in the same order they appear in the document?
I know this is an old issue but I've just run in to this problem and found that I could get the page number for bookmarks by modifying the earlier function as below:
private function extractBookmarks($pdf){
$bookmarks = [];
foreach ($pdf->getObjects() as $obj){
$details = $obj->getHeader()->getDetails();
if (isset($details['Title']) && isset($obj->getHeader()->getElements()['Dest'])) {
$page_no = $obj->getHeader()->getElements()['Dest']->getContent()[0]->getPageNumber();
$bookmarks[] = ['label' => $details['Title'], 'page' => $page_no];
}
}
return $bookmarks;
}
It may be better to change $bookmarks[] = ['label' => $details['Title'], 'page' => $page_no];
to $bookmarks[$details['Title']] = $page_no;
so the output is indexed on the bookmark title and the value is page number.
I am able to parse PDF metadata, and pages but I need PDF bookmark outlines with page numbers. Can you please help how to get them? I searched everywhere and read the doc as well but did not found anything useful.