metachris / pdfx

Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
http://www.metachris.com/pdfx
Apache License 2.0
1.03k stars 113 forks source link

How to get HyperText(not HyperLink)? #38

Open hoelan opened 4 years ago

hoelan commented 4 years ago

I have PDFs with many hyperlinks. I want to get the text label for the hyperlinks, not the hyperlink URLs.

import pdfx pdf = pdfx.PDFx("filename-or-url.pdf") references_list = pdf.get_references() for LinkObj in references_list: Link=LinkObj.ref # get url HyperText =LinkObj.text # CAN NOT GET LABEL over Link !

how to get HYPERTEXT.pdf

TIA