iipc / webarchive-commons

Common web archive utility code.
Apache License 2.0
50 stars 72 forks source link

Update API documentation to reflect current behaviour: #79

Open anjackson opened 6 years ago

anjackson commented 6 years ago

AFAICT, this is wrong:

https://github.com/iipc/webarchive-commons/blob/bb36b6a7375453e1cb8073211041ca3f955ab217/src/main/java/org/archive/io/ArchiveRecordHeader.java#L30-L35

The implementations appear to generate full ISO datetime stamps, i.e. 2011-02-17T06:59:50Z rather than the implied 20110217065950 (although I think older version of the code did do that).

ldko commented 6 years ago

From what I interpret, ArchiveRecordHeader is used for both ARCs and WARCs mostly on the reading of records side of things, so sometimes it returns the 20110217065950 format and sometimes the 2011-02-17T06:59:50Z format. In ArchiveUtils for ARCs there is get14DigitDate and for WARCs getLog14Date, so by those names, the "return Date in 14 digit time format (UTC)." description kind of fits in that they both consider themselves to be 14 digits (though W3C/ISO8601 format is 20 characters) vs. the 12 digit and 17 digit date creators that are also in ArchiveUtils (though the "@see org.archive.util.ArchiveUtils#parse14DigitDate(String)" seems misleading). I would be more concerned if that description showed up in WARCRecordInfo where I think the date is being handled for WARCs being written. Let me know if I missed your point entirely. But we can do a PR if you want to change the text.