iipc / warc-specifications

Centralised repository for WARC usage specifications.
http://iipc.github.io/warc-specifications/
100 stars 30 forks source link

WARC-Block-Digest and WARC-Payload-Digest examples are invalid #64

Open ato opened 4 years ago

ato commented 4 years ago

Section 5.8 and 5.9 in the WARC 1.1 spec include the following two examples:

WARC-Block-Digest: sha1:AB2CD3EF4GH5IJ6KL7MN8OPQ
WARC-Payload-Digest: sha1:3EF4GH5IJ6KL7MN8OPQAB2CD

These are invalid because:

  1. The character "8" is not part of the Base32 alphabet.

  2. The strings are of length 24. The SHA-1 function produces 160-bit output which when encoded as a Base32 string should be 160 / 5 = 32 characters long.

Note that the additional examples in Annex B do not suffer from these problems:

WARC-Block-Digest: sha1:2ASS7ZUZY6ND6CCHXETFVJDENAWF7KQ2
WARC-Payload-Digest: sha1:CCHXETFVJD2MUZY6ND6SS7ZENMWF7KQ2