Open jacobtakema opened 4 years ago
Well those 110 errors in jwat-tools are because the -l (relaxed uri) is not use by default. And presumable relaxed uri validation is default in the jhove module.
As for the digest. It is not computed correctly since one of the digest values is the digest of an empty string/bytearray. http://craiccomputing.blogspot.com/2009/09/sha1-digest-of-empty-string.html
Issue: JHOVE WARC-KB module gives different results compared to JWAT
Because JHOVE WARC-KB module uses JWAT-WARC library it's expected that output results are similar
E.g. The WARC file LUX-004-TEST-2017-12-18-20171220042523987-00100-16828_wbgrp-crawl007.us.archive.org_8443.warc run through both tools.
When I ran Jhove 1.22 with the WARC-KB module. Output: gives 84 errors with the message 'Incorrect payload digest'
When I ran jwat 0.6.6. Output: INVALID_EXPECTED: 66 REQUIRED_INVALID: 44 'WARC-Target-URI' value: 110
So JWAT gives exclusively 110 'WARC-Target-URI' messages And JHOVE gives exclusively 84 'Incorrect payload digest' errors.
This gives significant different results.
Good to know is that: JWAT-Tools 0.6.6 contains JWAT-warc v1.11 JHOVE 1.22 contains JWAT-warc 1.0.3
So what's causing this (totally) different output results?