Open peetucket opened 3 years ago
we have an instance up on the data analysis VM. a first pass indexing the manuals has been completed. 8734 documents were ingested. this pass just did basic text extraction on PDFs. another couple things TODO:
Do you want to extract text also from images and PDFs?
, which is not on by default)
See https://datashare.icij.org/
not sure it would be an alternative to a custom Blacklight instance - Nicole sees this as a tool that would work really well for us as we QA and later for advanced users (researchers) exploring a clearly defined collection such as the manuals