Closed saraaubry closed 9 years ago
In the current implementation of the WAT extractor, the WARC-Filename in tht WAT warcinfo record corresponds to the filename of the original (W)ARC record. According to the WARC ISO standard, it should be the WAT filename itself.
Current: WARC/1.0 WARC-Type: warcinfo WARC-Date: 2015-02-18T10:24:54Z WARC-Filename: BnF-6224-50-20150218094547-00001-ciblee_2015_menelas2.bnf.fr.warc.gz WARC-Record-ID: urn:uuid:97a37ea9-1af4-4c47-8ae0-5515428347aa Content-Type: application/warc-fields Content-Length: 73
Target: WARC/1.0 WARC-Type: warcinfo WARC-Date: 2015-02-18T10:24:54Z WARC-Filename: BnF-6224-50-20150218094547-00001-ciblee_2015_menelas2.bnf.fr.warc.wat.gz WARC-Record-ID: urn:uuid:97a37ea9-1af4-4c47-8ae0-5515428347aa Content-Type: application/warc-fields Content-Length: 73
Implementation: java extractor.jar -wat fichierA.warc.gz --> will go to standard output WARC-Filename: fichierA.warc.gz => fichierA.warc.wat.gz fichierA.arc.gz => fichierA.arc.wat.gz fichierA.warc => fichierA.warc.wat fichierA.arc => fichierA.arc.wat
java extractor.jar -wat fichierA.warc.gz fichierB.wat.warc.gz --> will go to file fichierB output WARC-Filename: fichierB.wat.warc.gz
In the current implementation of the WAT extractor, the WARC-Filename in tht WAT warcinfo record corresponds to the filename of the original (W)ARC record. According to the WARC ISO standard, it should be the WAT filename itself.
Current: WARC/1.0 WARC-Type: warcinfo WARC-Date: 2015-02-18T10:24:54Z WARC-Filename: BnF-6224-50-20150218094547-00001-ciblee_2015_menelas2.bnf.fr.warc.gz WARC-Record-ID: urn:uuid:97a37ea9-1af4-4c47-8ae0-5515428347aa Content-Type: application/warc-fields Content-Length: 73
Target: WARC/1.0 WARC-Type: warcinfo WARC-Date: 2015-02-18T10:24:54Z WARC-Filename: BnF-6224-50-20150218094547-00001-ciblee_2015_menelas2.bnf.fr.warc.wat.gz WARC-Record-ID: urn:uuid:97a37ea9-1af4-4c47-8ae0-5515428347aa Content-Type: application/warc-fields Content-Length: 73
Implementation: java extractor.jar -wat fichierA.warc.gz --> will go to standard output WARC-Filename: fichierA.warc.gz => fichierA.warc.wat.gz fichierA.arc.gz => fichierA.arc.wat.gz fichierA.warc => fichierA.warc.wat fichierA.arc => fichierA.arc.wat
java extractor.jar -wat fichierA.warc.gz fichierB.wat.warc.gz --> will go to file fichierB output WARC-Filename: fichierB.wat.warc.gz