sul-dlss / web-archiving

placeholder for web archiving work
0 stars 0 forks source link

web-archiving-stage volume: can we delete objects still queued for indexing? #12

Closed ndushay closed 7 years ago

ndushay commented 7 years ago

from JIRA DEVQUEUE-96:

"@lmcglohon and @ndushay will also take this opportunity to validate that accessioned crawl objects that are still queued for indexing can safely be deleted from the web-archiving-stage volume."

ndushay commented 7 years ago

heritrix writes to web-archiving-stage; other crawls are put there in other ways.

the ingest workflows read the data from web-archiving-stage. For now, @nullhandle deletes the crawls by hand. It may be that we can do some cleaning programmatically after crawls are in SDR and openwayback.