sul-dlss-labs / ksr

SRT Website Test
MIT License
0 stars 0 forks source link

QA the crawl of manuals #19

Open cncoleman opened 2 years ago

cncoleman commented 2 years ago

Where did we over-collect and under-collect with the crawl?

jmartin-sul commented 2 years ago

note from meeting discussions: since this will likely require a fair bit of tedious manual work, we should split this up systematically into manageable chunks. a google spreadsheet strikes me as the easiest for shared editing? checkboxes in github comments don't handle concurrent editing by multiple people well.

cncoleman commented 2 years ago

Do we want to wait on this until we have more detail from Dave Maas?

cncoleman commented 2 years ago

In order to keep track of the versions of the documents, we should first accession them with the relevant metadata. Since we will want to point to give search results and access at the level of the individual file, I think each document needs to have it's own druid.

cncoleman commented 2 years ago

It looks like Datashare will help us tremendously in evaluating the scraped documents. @jmartin-sul @gbasel @cncoleman will meet to review via Datashare and come up with a strategy.