Open kepp14 opened 3 years ago
I'm not super familiar with the annotation options. However, my guess us the 8 Line
annotations won't have any text associated with them. I also suspect that that text for the hyperlink is just part of the standard content stream of the page, and the Link
annotations define an invisible annotation that sits on top of the text to handle clicks.
In theory it'd be possible to grab the Rect
attribute from the 8 Link
annotations, and then fetch only the text from the page that sits within those boundaries. Unfortunately pdf-reader doesn't offer a nice API to do that. You'd have to create a customised version of PDF::Reader::PageTextReceiver
.
Hi! I'm having trouble getting the text related to a hyperlink in my PDF. By that, I mean that I have some text in my PDF, say SampleHyperlinkHere that when clicked opens another PDF. I'm able to get the PDF attached to the hyperlink using this script https://gist.github.com/danlucraft/5277732#gistcomment-2675302, but I want to be able to link which attachment comes from which text.
For example I have this page with 16 Annots:
and I notice that 8 of those are links (as expected) and I'm able to grab the attachment from the link for those 8 just fine.
Is there a way to use the other 8 annotations to get the text associated with the hyperlinks, or another way that I'm missing? Appreciate the help!