Fixed handling of multi-line link extraction

metachris / pdfx

Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.

http://www.metachris.com/pdfx

Apache License 2.0

1.03k stars 113 forks source link

Open maximiliancw opened 1 year ago

maximiliancw commented 1 year ago

Improved the extract_links function to include hyperlinks spanning over two or more lines by replacing line breaks in text (issue #40)