web-archive-group / WALK

Web Archives for Longitudinal Knowledge
8 stars 2 forks source link

Full-Text Generation - Warcbase or Leave for Solr? #41

Closed ianmilligan1 closed 7 years ago

ianmilligan1 commented 8 years ago

Right now as part of our warcbase processing script, we generate extracted full text. I think it's useful for warcbase testing, but given that we're eventually planning to primarily expose text through Solr (and will index text accordingly there), maybe it's a superfluous step?

Just opening this up to thoughts.

ianmilligan1 commented 7 years ago

We're finding the text is useful for now for other research products, so let's leave it in. But if we begin to run into storage issues, will have to reassess.