xproc / 3.0-steps

Repository for change requests to the standard step library and for official extension steps
10 stars 7 forks source link

Proposal for a unified validation reporting language #15

Closed gimsieke closed 4 years ago

gimsieke commented 6 years ago

From Mathieu’s email 2017-12-10 message:

Hi all,

In my company we need an XML generic format to get errors and warning from a multiple validation (schema + schematron). We’ve been looking at existing languages like SVRL, PSVI, XSV, saxon report language but none of them exactly matches our need. At the beginning, we thought SVRL was the best candidate, but we found it’s actually too much tied with schematron validation only.

So we finally create an internal grammar which we call XVRL (stands for “XML Validation Report Language”).

We discussed in Amsterdam meetup about a unifying report ports of validation steps in xproc as describe here : https://github.com/xproc/1.0-specification/issues/135 and we thought XVRL could help in this direction ? Please find as attach file a sample and the grammar (both Relax NG and XML Schema). If you find this is an interesting candidate I guess we could add in on github with an open source licence.

Any comments welcome about the format itself and about its use with xproc ?

Best regards

Matthieu Ricaud.

PS : do you think adding this proposal to issue https://github.com/xproc/1.0-specification/issues/135 makes sens ? or any where else (xml-dev list?) PPS : We also made an XVRL to JSON conversion using the xpath 3.0 xml-to-json() function

reporting.zip

xatapult commented 6 years ago

Matthieu writes to me about this 20181006:

Let me explain the reason why I go to this XVRL format:

I need an RNG reporting file for my project. At first I thought about SVRL (I don't like re-inventing the rules especially when it's an standard, iso one by the way). But looking more precisely at SVRL I realized it was really tied to Schematron validation:

A schema like RNG doesn’t use assertion or report, it only describe the structure. If the XML is not valid against this structure then the schema processor will generate an error. This error may be interpreted differently from a processor or another : is the attribute foo missing or the name of the element is not good ? It’s completely different from what schematron do and why SVRL was design for.

That’s why I go to a more generic error format. As a state of art, I had a look to : 2) XSV Example :report.xsv.xml , report.xsv.html 3) PSVI Schema :https://www.w3.org/2001/05/PSVInfoset.xsd Example : http://www.ukoln.ac.uk/metadata/dcmi/dcxml/psvi/psvi4-3.xml 4) Saxon report for XSD validation cf. https://www.saxonica.com/documentation/index.html#!functions/saxon/validate saxon.validation.report.xml

5) Other Cf. https://stackoverflow.com/questions/39974143/validate-xml-with-schema-and-get-validation-errors-in-xml

At the end I found the easiest way for my goal was to create this XVRL format.

XVRL was just a proposal, I invented quickly this syntax because of our need in my company. It can be improved, rename, delivered as open source, versionned etc. I can do that my company will really probably agree with this.

xatapult commented 6 years ago

I think his reasoning is sound. I've looked at the other formats (when possible, the first was behind a login) and there's nothing that completely fits.

The only problem is that his proposal has no status whatsoever. But if we invented something ourselves the same problem would occur.

So I suggest:

  1. Ask Matthieu to open source his proposal and put it on GitHub
  2. Reference this format in our step descriptions

The only thing I don't know is if this (referencing"just" some standard published on GitHub) is allowed given our W3C connection?

Thoughts?

gimsieke commented 6 years ago

I agree that SVRL is too much focused on Schematron. However, there is nothing in XSD/RNG/DTD validation outputs that couldn’t be squeezed into SVRL.

A couple of things that we rely on in SVRL are lacking in XVRL:

I propose that we as the XProc CG publish a modified spec (and give Matthieu due credit for the original spec). We can call it XVRL or GVRL (G for generalized).

supplementary_message

xatapult commented 6 years ago

Ehh, ok... Given the rather big list up there, is making such a spec something you or Le-tex can do? It wouldn't make sense for me to be just a scribe for somebody else's strong ideas about some subject...

And of course it can't wait very long (a few months?) to give the implementors enough time.

gimsieke commented 6 years ago

Yes, I can write the spec, a Relax NG schema and the SVRL→XVRL XSLT

mricaud commented 6 years ago

Hi all

Thanks for the reporting and improvments! I like your proposal Gerrit, makes sens, and yes, feel free to make a new spec, xvrl was just a first attempt.

Maybe the new namespace should not ne bound to xproc in case one like to use it in another context ? but which organization then ? Well maybe it's easier to bound it to xproc after all !

Most of the validation engine use a kind a dictionnary to display error messages. Like i18n, it uses some id for "part of sentences". Don't know if it's a good idea to represent it in the new format ? Exemple : https://github.com/IDPF/epubcheck/tree/master/src/main/resources/com/thaiopensource This is specific to the schema language and maybe it should be computed before the final XVRL ?

If you need any help just tell me !

Cheers

AndrewSales commented 5 years ago

Hello, @gimsieke directed me here and after consulting @sgmlguru and @Gertone also, would like to offer to pitch in. For reference, we also did something along these lines (but inadequate for these purposes) a while back. Not wishing to tread on any toes, @gimsieke, @mricaud - but I stand ready :)

gimsieke commented 5 years ago

Hi Andrew, by all means, please join us here. I will create a draft Schema (starting with a preliminary RNC that Norm created a couple weeks ago) in another repo over the weekend. Then we can discuss on, before, and after Thursday’s unconference track whether the proposed model seems adequate and what else people might need in “XVRL”.

ndw commented 5 years ago

Yes, please, @AndrewSales I'd welcome a coherent proposal. I've scratched at it a bit, but haven't produced anything I'm confident about.

ndw commented 5 years ago

See also: https://github.com/xproc/xvrl

gimsieke commented 5 years ago

@hrennau suggested that we also consider supporting SHACL validation (there’s issue #8 that already mentions SHACL), and then there needs to be a bidirectional mapping (if not an outright identity for overlapping areas covered) between the XVRL vocabulary and the SHACL Validation Report Vocabulary. The serialization format is then probably everything that an RDF graph can be serialized as.

Gertone commented 5 years ago

There is also SHEX as an alternative to SHACL. Implementations are not at the level yet that everyone is extremely happy using them, so it might be pretty early days to implement them as a step. But worth looking into of course. But if we do so, we need to give whatever comes out of https://tools.ietf.org/pdf/draft-handrews-json-schema-00.pdf at least the same attention.

xml-project commented 4 years ago

Is there anything more to say on this with regard to validating XML? Close?

gimsieke commented 4 years ago

Further discussion should take place on https://github.com/xproc/xvrl