Proposal for a unified validation reporting language

gimsieke commented 6 years ago

From Mathieu’s email 2017-12-10 message:

Hi all,

In my company we need an XML generic format to get errors and warning from a multiple validation (schema + schematron). We’ve been looking at existing languages like SVRL, PSVI, XSV, saxon report language but none of them exactly matches our need. At the beginning, we thought SVRL was the best candidate, but we found it’s actually too much tied with schematron validation only.

So we finally create an internal grammar which we call XVRL (stands for “XML Validation Report Language”).

We discussed in Amsterdam meetup about a unifying report ports of validation steps in xproc as describe here : https://github.com/xproc/1.0-specification/issues/135 and we thought XVRL could help in this direction ? Please find as attach file a sample and the grammar (both Relax NG and XML Schema). If you find this is an interesting candidate I guess we could add in on github with an open source licence.

Any comments welcome about the format itself and about its use with xproc ?

Best regards

Matthieu Ricaud.

PS : do you think adding this proposal to issue https://github.com/xproc/1.0-specification/issues/135 makes sens ? or any where else (xml-dev list?) PPS : We also made an XVRL to JSON conversion using the xpath 3.0 xml-to-json() function

reporting.zip

xatapult commented 6 years ago

Matthieu writes to me about this 20181006:

Let me explain the reason why I go to this XVRL format:

I need an RNG reporting file for my project. At first I thought about SVRL (I don't like re-inventing the rules especially when it's an standard, iso one by the way). But looking more precisely at SVRL I realized it was really tied to Schematron validation:

An SVRL file starts with
for each in the schematron, it generates an
Within a rule, for each assert that failed, SVRL generates an
The same with
The message can also reference a diagnostic
there is also a reference to the phase with

A schema like RNG doesn’t use assertion or report, it only describe the structure. If the XML is not valid against this structure then the schema processor will generate an error. This error may be interpreted differently from a processor or another : is the attribute foo missing or the name of the element is not good ? It’s completely different from what schematron do and why SVRL was design for.

That’s why I go to a more generic error format. As a state of art, I had a look to : 2) XSV Example :report.xsv.xml , report.xsv.html 3) PSVI Schema :https://www.w3.org/2001/05/PSVInfoset.xsd Example : http://www.ukoln.ac.uk/metadata/dcmi/dcxml/psvi/psvi4-3.xml 4) Saxon report for XSD validation cf. https://www.saxonica.com/documentation/index.html#!functions/saxon/validate saxon.validation.report.xml

5) Other Cf. https://stackoverflow.com/questions/39974143/validate-xml-with-schema-and-get-validation-errors-in-xml

At the end I found the easiest way for my goal was to create this XVRL format.

XVRL was just a proposal, I invented quickly this syntax because of our need in my company. It can be improved, rename, delivered as open source, versionned etc. I can do that my company will really probably agree with this.

xatapult commented 6 years ago

I think his reasoning is sound. I've looked at the other formats (when possible, the first was behind a login) and there's nothing that completely fits.

The only problem is that his proposal has no status whatsoever. But if we invented something ourselves the same problem would occur.

So I suggest:

Ask Matthieu to open source his proposal and put it on GitHub
Reference this format in our step descriptions

The only thing I don't know is if this (referencing"just" some standard published on GitHub) is allowed given our W3C connection?

Thoughts?

gimsieke commented 6 years ago

I agree that SVRL is too much focused on Schematron. However, there is nothing in XSD/RNG/DTD validation outputs that couldn’t be squeezed into SVRL.

A couple of things that we rely on in SVRL are lacking in XVRL:

Messages need to be able to contain markup. We use this for HTML hyperlinks, tables or lists in the message text, and also for transporting formalized, non-free-text message metadata for which there is no SVRL attribute like @role. Examples include the ubiquitous @srcpath attribute which is a location identifier that will be kept across multiple conversion steps (for ex. <span class="srcpath">file:/C:/cygwin/home/gerrit/Springer/docx2app-git/test_after/Drews_334495_1_En/M_0_004.docx.tmp/word/document.xml?xpath=/w:document[1]/w:body[1]/w:p[6]/w:r[12]</span>) or classification (for ex. <span class="category">Typesetting</span>, with span as the SVRL span element, not the HTML one.
Instead of @role which is often used to transport severity information, we’d prefer a severity attribute with a fixed vocabulary (info, warning, error, fatal-error).
In addition to arbitrary markup in the messages (maybe only in other namespaces than XVRL), people should be allowed to use arbitrary attributes. They probably need to be in other namespaces, too. For transpect, we can put the srcpath in such a custom attribute, @tr:srcpath. Some attributes will pertain to individual localized messages, but most of them will relate to what is called report in XVRL.
It might make sense to standardize some of the custom metadata fields that we use right now, like srcpath, category and family, where family is a name for what is now the validation-report element. We called it family because phase was already taken. Maybe purpose or kind will also be ok for this. The idea is to attach a name to the validation in order to discern the validation report of an intermediate XML format from a validation of an EPUB OPF document, for example. The schema element in the metadata for each validation-report should already provide such a unique name, but only as a file name (system identifier). It might not be desciptive enough. Also, two schemas with different system identifiers may implement the same validation family.
The family is meant for grouping different messages in human-readable reports. Another view groups the validation messages by category or aspect, such as “Typography” or “Style name conventions”. In order to be able to provide these alternative groupings for users, the category names must be localizable. Therefore they may not be just attributes on report. report elements should rather have category children, where category may carry @xml:lang attributes. There may be multiple categories per report and language.
It is probably a good idea to split the message into the main message and supplementary information (for which Schematron’s diagnostic was originally intended, until it became used for L10N only). Supplementary information may hold the aforementioned tables, lists, or links to documentation. See screenshot below for an example.
Whether the element names validation-report, report, and message are most intuitive remains to be discussed.
We should check whether all SVRL peculiarities (phase, fired rule and its context, …) can be accommodated. There should be an XSLT transformation from SVRL to XVRL. SVRL is
It is desirable to be able to have a summary after metadata below the top-level element. It lists the most severe severity, the number of distinct messages (reports in the current XVRL sense) for each severity and the total number of messages for each severity.
If some standardization body (maybe the XProc CG) published the spec, there should be another namespace URI than http://www.lefebvre-sarrut.eu/ns/els/xvrl (something with xproc.org in it).
Since some schema validators are unable to report the error path, the path attribute should be optional.

I propose that we as the XProc CG publish a modified spec (and give Matthieu due credit for the original spec). We can call it XVRL or GVRL (G for generalized).

supplementary_message

xatapult commented 6 years ago

Ehh, ok... Given the rather big list up there, is making such a spec something you or Le-tex can do? It wouldn't make sense for me to be just a scribe for somebody else's strong ideas about some subject...

And of course it can't wait very long (a few months?) to give the implementors enough time.

gimsieke commented 6 years ago

Yes, I can write the spec, a Relax NG schema and the SVRL→XVRL XSLT

mricaud commented 6 years ago

Hi all

Thanks for the reporting and improvments! I like your proposal Gerrit, makes sens, and yes, feel free to make a new spec, xvrl was just a first attempt.

Maybe the new namespace should not ne bound to xproc in case one like to use it in another context ? but which organization then ? Well maybe it's easier to bound it to xproc after all !

Most of the validation engine use a kind a dictionnary to display error messages. Like i18n, it uses some id for "part of sentences". Don't know if it's a good idea to represent it in the new format ? Exemple : https://github.com/IDPF/epubcheck/tree/master/src/main/resources/com/thaiopensource This is specific to the schema language and maybe it should be computed before the final XVRL ?

If you need any help just tell me !

Cheers

AndrewSales commented 5 years ago

Hello, @gimsieke directed me here and after consulting @sgmlguru and @Gertone also, would like to offer to pitch in. For reference, we also did something along these lines (but inadequate for these purposes) a while back. Not wishing to tread on any toes, @gimsieke, @mricaud - but I stand ready :)

gimsieke commented 5 years ago

Hi Andrew, by all means, please join us here. I will create a draft Schema (starting with a preliminary RNC that Norm created a couple weeks ago) in another repo over the weekend. Then we can discuss on, before, and after Thursday’s unconference track whether the proposed model seems adequate and what else people might need in “XVRL”.

ndw commented 5 years ago

Yes, please, @AndrewSales I'd welcome a coherent proposal. I've scratched at it a bit, but haven't produced anything I'm confident about.

ndw commented 5 years ago

xproc / 3.0-steps

Proposal for a unified validation reporting language #15