sul-dlss / wasapi-downloader

Java application to download WARCs from WASAPI
Other
6 stars 4 forks source link

download WARCs into correct location #74

Closed ndushay closed 7 years ago

ndushay commented 7 years ago

possibly covered by #31

desired (?): (basedir)/(AIT coll id)/(AIT crawl id)/(AIT crawl-start timestamp)/(actual warcs)

basedir from settings: /was_unaccessioned_data/jobs

ndushay commented 7 years ago

current dirs in production:

Archive-IT exemplars:

/was_unaccessioned_data/jobs/AIT_1023/2015_06/ /was_unaccessioned_data/jobs/AIT_1208/2008_10/ /was_unaccessioned_data/jobs/AIT_5425/201504/warcs/

Other: /was_unaccessioned_data/jobs/(coll name)/(date)/warcs/

/was_unaccessioned_data/jobs/cesta/20170103080004/warcs/ /was_unaccessioned_data/jobs/edsource/20170101080007/warcs/ /was_unaccessioned_data/jobs/SUL_Maryam_Mirzakhani/20140826234758/warcs/ /was_unaccessioned_data/jobs/carter/20160323164036/warcs

Another: /was_unaccessioned_data/jobs/(coll name)/latest/warcs

/was_unaccessioned_data/jobs/suwebsites/latest/warcs/ /was_unaccessioned_data/jobs/chinese_ngo/latest/warcs/ (dated Dec 3 2015, with "heritrix" in the warc file names")

Other other:

/was_unaccessioned_data/jobs/digital_michelangelo/20170209/atlas/ has tar.gz files