Right now as part of our warcbase processing script, we generate extracted full text. I think it's useful for warcbase testing, but given that we're eventually planning to primarily expose text through Solr (and will index text accordingly there), maybe it's a superfluous step?
We're finding the text is useful for now for other research products, so let's leave it in. But if we begin to run into storage issues, will have to reassess.
Right now as part of our warcbase processing script, we generate extracted full text. I think it's useful for warcbase testing, but given that we're eventually planning to primarily expose text through Solr (and will index text accordingly there), maybe it's a superfluous step?
Just opening this up to thoughts.