This PR integrates @kamurphy11's PDF highlight detection code into a papermage parser that now adds a layer of highlights. This is incorporated into the MaterialsRecipe.
NOTE: this code will currrently cause errors for parsing most PDF files. This is because PaperMage layers expect entities to be non-overlapping, which is complicated by a number of factors in our PDFs: symbols like "®" cause highlights to overlap with the previous line, leading to overwide spans for some annotations; we also have straightforwardly overlapping annotations.
Currently, at least PDF 6 in the os.listdir list can be correctly parsed - that is, "Effects of build direction and heat treatment on creep properties of Ni-base superalloy built up by additive manufacturing.pdf"
This PR integrates @kamurphy11's PDF highlight detection code into a papermage parser that now adds a layer of highlights. This is incorporated into the
MaterialsRecipe
.NOTE: this code will currrently cause errors for parsing most PDF files. This is because PaperMage layers expect entities to be non-overlapping, which is complicated by a number of factors in our PDFs: symbols like "®" cause highlights to overlap with the previous line, leading to overwide spans for some annotations; we also have straightforwardly overlapping annotations.
Currently, at least PDF 6 in the
os.listdir
list can be correctly parsed - that is, "Effects of build direction and heat treatment on creep properties of Ni-base superalloy built up by additive manufacturing.pdf"This closes out #2 and #14.