Open ato opened 6 years ago
The implementation guidelines have this to say on the WARC-Date matter:
Note that a different behavior should be adopted for payload migration: according to the standard, the WARC-date of a conversion record is the date of the creation of the new record, that is when the migration occurred. There is indeed a great difference between converting a file from a container format to another, and migrating the format of this file.
Problem 1: dates
What should WARC-Date on a 'conversion' record be? Section 5.4 says:
Does 'data capture' in the context of a conversion refer to the capture of the original record? Or does it refer to the moment you started writing the transformed content? If the former how do you record the date of transformation? If the latter how do you know the date the resource was originally archived? Presumably by following WARC-Refers-To header right?
However section 6.8 'conversion' includes this statement:
Which implies you should not rely on the original record for anything... but how do you actually do that?
One solution to this problem would to be to allow and recommend WARC-Refers-To-Date on 'conversion' records. The case of a conversion of a conversion needs specifying too.
Problem 2: protocol headers
If you convert request or response record do you convert the HTTP headers too? If you don't we run into the 'freestanding, complete record' problem again. Some HTTP headers are necessary for replay.
The examples and this statement sort of imply you don't include protocol headers:
Can you use a conversion record to transform from one protocol to another?
Problem 3: determining the type of the original record
Again we trip over 'freestanding, complete'. After the original record is lost how do you know if the conversion was made from a 'response' or 'request' record? Nothing seems to imply you couldn't make a 'conversion' of a 'request' or even a 'warcinfo' for that matter.