sul-dlss-labs / ksr

SRT Website Test
MIT License
0 stars 0 forks source link

Explore Datashare as a way to explore indexed documents #36

Open peetucket opened 3 years ago

peetucket commented 3 years ago

See https://datashare.icij.org/

  1. Spin up locally or on a new/existing VM
  2. Add police manuals

not sure it would be an alternative to a custom Blacklight instance - Nicole sees this as a tool that would work really well for us as we QA and later for advanced users (researchers) exploring a clearly defined collection such as the manuals

jmartin-sul commented 2 years ago

we have an instance up on the data analysis VM. a first pass indexing the manuals has been completed. 8734 documents were ingested. this pass just did basic text extraction on PDFs. another couple things TODO:

  1. re-run the indexing with OCR enabled (Tasks -> Analyze Your Documents -> Extract Text (_and enable Do you want to extract text also from images and PDFs?, which is not on by default)
  2. do an analysis run that does named entity extraction (Tasks -> Analyze Your Documents -> Find people, organizations, locations and email addresses). TODO: does this require first installing an NER plugin? there are options in Settings)