harvard-lil / perma

Indelible links
420 stars 71 forks source link

WARC dump not deterministic #3240

Open villesundell opened 1 year ago

villesundell commented 1 year ago

Hello, and first of all thank you for Perma, it's a great service! :pray:

I am trying to save a WARC hash on the Bitcoin blockchain for temporal provenance. The original plan was to provide GUID with the SHA3-512 hash of the WARC. However, I noticed that the downloaded WARC is not deterministic: the two additional warcinfos with download date and time makes each download unique.

However, in my understanding WARC dumps should be deterministic (https://flowvella.com/s/3e9w/0B1C19C0-D882-41D3-910D-0A77D47F4C58).

Is there a way to get a deterministic WARC (in this particular case: a file that is always the same in each download)?