In some places on the web, invalid URIs may be used to identify resource representations. For example, at one point (perhaps still) Google Fonts recommended values like https://fonts.googleapis.com/css?family=Open+Sans:400,600,800,700|Open+Sans+Condensed:300.
The un-encoded pipe (|) here is invalid via RFC3986 (also see here) and I believe it may be WARCreate's responsibility to ensure this value is stored in WARCs in a manner that ensures interoperability.
$ jwattools test -e warc-in-question.warc will report these errors for invalid WARCs in the produced i.out file.
TODO: check validity of URIs, particularly in the WARC-Target-URI field, prior to association them with a preserved entity representation.
In some places on the web, invalid URIs may be used to identify resource representations. For example, at one point (perhaps still) Google Fonts recommended values like
https://fonts.googleapis.com/css?family=Open+Sans:400,600,800,700|Open+Sans+Condensed:300
.The un-encoded pipe (
|
) here is invalid via RFC3986 (also see here) and I believe it may be WARCreate's responsibility to ensure this value is stored in WARCs in a manner that ensures interoperability.$ jwattools test -e warc-in-question.warc
will report these errors for invalid WARCs in the produced i.out file.TODO: check validity of URIs, particularly in the
WARC-Target-URI
field, prior to association them with a preserved entity representation.