PDFx is storing prior parsed PDFs causing incorrect references / annotations to be found

Doc1.pdf Doc2.pdf

Parsing annotations with get_references() on multiple files will cause annotations from all prior parsed PDFs to appear in the current one.

PDF 1: Correct

from pdfx import PDFx
pdf_1 = PDFx('Doc1.pdf')
print([url.ref for url in pdf_1.get_references()])
# >> ['http://www.google.com/', 'google.com']

PDF 2: Correct

from pdfx import PDFx
pdf_2 = PDFx('Doc2.pdf')
print([url.ref for url in pdf_2.get_references()])
# >> ['bing.com', 'http://www.bing.com/']

PDF1 and PDF2 Together: Bug - PDF2 has annotations from PDF1

# -*- coding: utf-8 -*-
from pdfx import PDFx
pdf_1 = PDFx('Doc1.pdf')
print([url.ref for url in pdf_1.get_references()])
# >> ['google.com', 'http://www.google.com/']
pdf_2 = PDFx('Doc2.pdf')
print([url.ref for url in pdf_2.get_references()])
# >> ['http://www.google.com/', 'bing.com', 'google.com', 'http://www.bing.com/']

metachris / pdfx

PDFx is storing prior parsed PDFs causing incorrect references / annotations to be found #14