phax / ph-schematron

Java Schematron library that supports XSLT and native application
Apache License 2.0
115 stars 36 forks source link

Language attribute not set on DiagnosticReference when validating as pure Schematron #126

Closed costas80 closed 2 years ago

costas80 commented 2 years ago

We use ph-schematron (version 6.2.7) to validate XML against Schematron files which can be both already converted to XSLT or validated as "pure" Schematron (Schematron files are user-provided). When working with multilingual Schematron files (i.e. ones that define language-specific messages as diagnostic elements) we see that in the case of pure Schematron validation the value of the diagnostics' xml:lang attribute is not included. In contrast, in the case of XSLT-based validation, the xml:lang values are present as expected. This inconsistency leads us to only be able to use XSLT-based validation for multilingual Schematrons, given that we depend on the language attribute to build validation reports based on the user's locale.

To illustrate with an example, consider the following multilingual Schematron:

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" xml:lang="de">
  <sch:title>Example of Multi-Lingual Schema</sch:title>
  <sch:pattern>
    <sch:rule context="dog">
      <sch:assert test="bone" diagnostics="d1 d2"> A dog should have a bone.</sch:assert>
    </sch:rule>
  </sch:pattern>
  <sch:diagnostics>
    <sch:diagnostic id="d1" xml:lang="en"> A dog should have a bone.</sch:diagnostic>
    <sch:diagnostic id="d2" xml:lang="de"> Das  Hund muss ein Bein haben.</sch:diagnostic>
  </sch:diagnostics>
</sch:schema>

Having validated this (either in pure mode or after pre-processing to XSLT) I get a SchematronOutputType that I then process with the SVRLHelper class. From here, accessing each failure's DiagnosticReference instances, the lang is only set if the Schematron was validated as XSLT:

SchematronOutputType report = ...
for (var failure: SVRLHelper.getAllFailedAssertions(report)) {
   for (var diagnostic: failure.getDiagnisticReferences()) {
      diagnostic.getLang(); // In pure mode this is always null
   }
}

I would expect that the lang value is also present in the case of pure-mode validation. Could you confirm if this is indeed an issue with the current version or if I'm missing something? Thanks!

phax commented 2 years ago

This is indeed an issue.

Okay, I crosschecked with the XSLT implementation and the following attributes may be contained:

Therefore I needed to extend the svrl.xsd as well to allow for these attributes. Afterwards, copying the values from the diagnostic was simple.

Unfortunately the documentation on this issue is very poor in the ISO spec....

phax commented 2 years ago

Part of the 6.2.8 release