iipc / webarchive-commons

Common web archive utility code.
Apache License 2.0
49 stars 72 forks source link

WAT extractor: adding information in WAT's warcinfo #47

Closed scheylord closed 8 years ago

scheylord commented 9 years ago

warcinfo metadata fields Software-Info and Extracted-Date should be consistent with syntax used in WARC data files: Software-Info => software Extracted-Date => extractedDate

We could also add the following: ip: 172.20.20.79 hostname: menelas2.bnf.fr format: WARC File Format 1.0 conformsTo: http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf operator: BnF - DLWeb publisher: Bibliotheque nationale de France description: xxxxxxxxx