metachris / pdfx

Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
http://www.metachris.com/pdfx
Apache License 2.0
1.03k stars 113 forks source link

Fixed handling of multi-line link extraction #58

Open maximiliancw opened 1 year ago

maximiliancw commented 1 year ago

Improved the extract_links function to include hyperlinks spanning over two or more lines by replacing line breaks in text (issue #40)