xproc / 1.0-specification

The 1.0 XProc specification and now abandoned drafts of a 2.0 XML specification
12 stars 6 forks source link

A generalized XML validation step #135

Open ndw opened 9 years ago

ndw commented 9 years ago

From Gerrit Imsieke:

Others have already asked for unified report ports for the validation steps p:validate-with-relax-ng and p:validate-with-xml-schema. While we see that it might not be easy to change the signature of the existing standard library steps, here’s a fresh approach that also saves a lot of verbosity.

It builds upon the xml-model processing instruction that may be prepended to an XML document (http://www.w3.org/TR/xml-model/).

We could either add a step p:validate-according-to-xml-models that executes each validation and creates a sequence of c:errors and svrl:schematron-output documents on the report port.

But as we strive for terseness of expression, we may add an attribute use-xml-models="assert-valid|report-only|none" to input and output ports. (p:input: both declarations and connections, where the attribute value on connections has precedence).

If the attribute is on a step’s input port and its value is 'report-only', it will add a port 'report' (sequence=true) to the readable ports within the step. Alternatively, the port could be named 'error', to avoid an additional name for ports that magically spring into existence.

If the attribute value is 'assert-valid' and if the step is within a p:try/p:group, it will add these report documents to the error port of subsequent p:catch instructions.

This will greatly reduce verbosity by eliminating the need to spell out input/output validation steps explicitly. It is syntactic sugar that may be expanded to long-form explicit validation instructions (by means of XSLT transformation, for example).

If there are no xml-model PIs, no validation will occur.

xml-model-based validation should support Relax NG, Relax NG compact syntax, XSD in different versions, ISO Schematron, NVDL, and DTD.

Because prepending xml-model PIs to documents is a bit cumbersome, there should be an optional step p:prepend-xml-model like this:

ndw commented 9 years ago

Proposal for a generalized validation step

The following declaration is for a generalized validation step. Much of what follows describes how this step applies to XML validation, but the actual validation performed is implementation defined. Validation of JSON data against json-schema would be entirely plausible.

<p:declare-step type="p:validate">
   <p:input port="source" primary="true"
            content-types="application/octet-stream"/>
   <p:input port="schema" sequence="true"
            content-types="application/octet-stream"/>
   <p:input port="models" sequence="true"
            content-types="application/xml */*+xml text/*"/>
   <p:output port="result" primary="true" sequence="true"/>
   <p:output port="report" sequence="true"/>
   <p:output port="validation-attempted" sequence="true"/>
   <p:option name="assert-valid" select="'true'" as="xs:boolean"/>
   <p:option name="group" select="''" as="xs:string"/>
   <p:option name="phase" select="''" as="xs:string"/>
   <p:option name="version" as="xs:string"/>
   <p:option name="parameters" as="map(xs:QName,item())"/>
</p:declare-step>

The semantics of the p:validate step are that the source document is validated in an implementation defined way. The schema and models ports exist only to provide suggestions to the implementation.

There are several possible outputs:

  1. If the processor considers that no validation was requested (or does not recognize or cannot perform the requested validation), or if the assert-valid option was false and validation failed, then the original document is returned on the result port.
  2. If the processor attempts to validate and succeeds, then the validated document or documents are returned on the result port. In this case, the validation-attempted port should document the validation or validations that were attempted.
  3. If the processor attempts to validate and fails, and the assert-valid option is true, then nothing appears on the output port and an error is raised.

ISSUE: In the case where this error is caught by p:catch (how) can the validation-attempted and report steps be read?

The output on the report step depends on the validation attempted. For Schematron validation, a report format is defined. For other kinds of validation, the report is implementation-defined.

Although the step is for generalized validation, it does have a couple of options designed to support a specific XML scenario: the XML Model Processing Instruction. In the absense of other information, implementations should use the XML Model PI to determine what kind of validation to perform on XML documents.

The group and phase options provide the corresponding values as discussed in the XML Model PI spec.

The models input port and the validation-attempted output port use XML documents to describe desired validation in the former case and validations attempted in the latter. The following c:model element definition should be supported.

<c:model
   href? = anyURI
   type? = string
   schematypens? = anyURI
   charset? = string
   title? = string
   group? = string
   phase? = string
   />

Additional variations on c:model are allowed, as are entirely different vocabulary elements as appropriate.

Validation with RELAX NG

When RELAX NG validation is selected, the following parameters should be recognized: dtd-attribute-values, and dtd-id-idref-warnings.

Validation with XML Schema

When XML Schema validation is selected, the following parameters should be recognized: use-location-hints, try-namespaces, and mode.

Validation with NVDL

If an NVDL schema appears on the models port, NVDL validation should be attempted.

ndw commented 9 years ago

This was discussed at the 25 Feb 2015 meeting, http://www.w3.org/XML/XProc/2015/02/25-minutes (the issue, that is, not the proposal)

josteinaj commented 9 years ago

An output port with basic information about the validation when assert-valid="false" would be useful. Such as the total number of assertions, number of assertions failed, skipped, with warnings and succeeded. Currently a schematron validation succeeds if count(//svrl:failed-assert) + count(//svrl:successful-report) = 0, and this XPath is different for other kinds of validations. Some metadata about the validation, when available, might also be useful to include in such a document, such as name (/sch:schema/sch:title) and base URI of the source document. Maybe just something like:

<c:result name="Test Name" tests="18" skipped="7" errors="5" warnings="3"/>