web-archive-group / WALK

Web Archives for Longitudinal Knowledge
8 stars 2 forks source link

Generate Sample Index #59

Closed ianmilligan1 closed 7 years ago

ianmilligan1 commented 7 years ago

On snotra in /mnt/warcs we have six sample WARCs, each from one of the WALK member institutions.

ARCHIVEIT-227-QUARTERLY-JOB167811-20150805221623032-00001.warc.gz*
ARCHIVEIT-3490-DAILY-24916-20130220204903589-00008-wbgrp-crawl066.us.archive.org-6441.warc.gz*
ARCHIVEIT-4527-CRAWL_SELECTED_SEEDS-JOB169013-20150814223032039-00000.warc.gz*
ARCHIVEIT-5467-NONE-24393-20150320234653068-00001-wbgrp-crawl056.us.archive.org-6445.warc.gz*
ARCHIVEIT-7414-TEST-JOB222875-20160630155827876-00000.warc.gz*
ARCHIVEIT-7485-CRAWL_SELECTED_SEEDS-JOB246020-20161101003506923-00041.warc.gz*

Would we be able to create a sample solr index that we could use when playing with Blacklight locally? This could have the -institution -collection -collection_number facets same as the production one, but can have arbitrary data in those fields.

Then we'd point our local version at it, so we can play around with it without having to use the 1TB main index.

ruebot commented 7 years ago

Done.

Well send info via slack.

ianmilligan1 commented 7 years ago

🎉 Works like a charm!