Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
1.05k
stars
115
forks
source link
PDFx is storing prior parsed PDFs causing incorrect references / annotations to be found #14
Closed
scottwernervt closed 8 years ago
Doc1.pdf Doc2.pdf
Parsing annotations with
get_references()
on multiple files will cause annotations from all prior parsed PDFs to appear in the current one.PDF 1: Correct
PDF 2: Correct
PDF1 and PDF2 Together: Bug - PDF2 has annotations from PDF1