usnationalarchives / OPAProd

Tracking enhancements to OPAProd
1 stars 0 forks source link

Extract PDF text layer in API output #57

Open DominicBM opened 9 years ago

DominicBM commented 9 years ago

Currently, the text layers of PDFs are not being extracted. This is a type of technical metadata that did not make it into the design (an oversight), but which is very important, especially since it is already being searched on by the search engine—but if there is a hit, the term won't come back in API results because the text layer is not exposed.