Closed ndushay closed 7 years ago
heritrix writes to web-archiving-stage; other crawls are put there in other ways.
the ingest workflows read the data from web-archiving-stage. For now, @nullhandle deletes the crawls by hand. It may be that we can do some cleaning programmatically after crawls are in SDR and openwayback.
from JIRA DEVQUEUE-96:
"@lmcglohon and @ndushay will also take this opportunity to validate that accessioned crawl objects that are still queued for indexing can safely be deleted from the web-archiving-stage volume."