pdf-association / arlington-pdf-model

A vendor- and implementation-independent specification-derived, machine-readable model of PDF.
Apache License 2.0
74 stars 6 forks source link

Flag all keys and array elements that are IRIs, URIs, URLs, or URNs #42

Open petervwyatt opened 1 year ago

petervwyatt commented 1 year ago

It has been requested to add a predicate identifying all keys or array elements that can be IRIs, URIs, URLs, or URNs (i.e. some form of web link). It is not decided if the subtle differences between these need to be identified or not - this should be dictated by the wording or normative reference used in ISO 32000-2 rather than by any implementation (which may or may not be correct!).

petervwyatt commented 1 year ago

Thinking about this more broadly and in a generalized manner... maybe the predicate should be something like fn:CompliesWith(xxx) where xxx could be an RFC doc, an ISO standard, etc. and expressed using the Arlington grammar rules for a PDF key (a-z A-Z 0-9 _) in much the same way as fn:Extension(xxx)?

Then various CT entries could assert fn:CompliesWith(RFC_2045) to indicate a file format requirement for a valid content type, specific streams could assert fn:CompliesWith(ISO_15076) for ICC, etc. and expressions could be built up inside a wrapping fn:Eval(...) with logical OR || and logical AND &&.

PS. Note also that a lot of usage of RFCs in ISO 32000 relates to processor requirements and not file format requirements (e.g. encryption and filter related).

a20dev commented 1 year ago

Besides URLs and friends there is at least one other string entry that must conform to a standard: Lang

petervwyatt commented 1 year ago

Also WKT in geographic coordinate system dictionaries. And others...