Closed ndushay closed 7 years ago
current dirs in production:
Archive-IT exemplars:
/was_unaccessioned_data/jobs/AIT_1023/2015_06/
/was_unaccessioned_data/jobs/AIT_1208/2008_10/
/was_unaccessioned_data/jobs/AIT_5425/201504/warcs/
Other:
/was_unaccessioned_data/jobs/(coll name)/(date)/warcs/
/was_unaccessioned_data/jobs/cesta/20170103080004/warcs/
/was_unaccessioned_data/jobs/edsource/20170101080007/warcs/
/was_unaccessioned_data/jobs/SUL_Maryam_Mirzakhani/20140826234758/warcs/
/was_unaccessioned_data/jobs/carter/20160323164036/warcs
Another:
/was_unaccessioned_data/jobs/(coll name)/latest/warcs
/was_unaccessioned_data/jobs/suwebsites/latest/warcs/
/was_unaccessioned_data/jobs/chinese_ngo/latest/warcs/
(dated Dec 3 2015, with "heritrix" in the warc file names")
Other other:
/was_unaccessioned_data/jobs/digital_michelangelo/20170209/atlas/
has tar.gz files
possibly covered by #31
desired (?):
(basedir)/(AIT coll id)/(AIT crawl id)/(AIT crawl-start timestamp)/(actual warcs)
basedir from settings:
/was_unaccessioned_data/jobs