Closed gimsieke closed 4 years ago
Matthieu writes to me about this 20181006:
Let me explain the reason why I go to this XVRL format:
I need an RNG reporting file for my project. At first I thought about SVRL (I don't like re-inventing the rules especially when it's an standard, iso one by the way). But looking more precisely at SVRL I realized it was really tied to Schematron validation:
A schema like RNG doesn’t use assertion or report, it only describe the structure. If the XML is not valid against this structure then the schema processor will generate an error. This error may be interpreted differently from a processor or another : is the attribute foo missing or the name of the element is not good ? It’s completely different from what schematron do and why SVRL was design for.
That’s why I go to a more generic error format. As a state of art, I had a look to : 2) XSV Example :report.xsv.xml , report.xsv.html 3) PSVI Schema :https://www.w3.org/2001/05/PSVInfoset.xsd Example : http://www.ukoln.ac.uk/metadata/dcmi/dcxml/psvi/psvi4-3.xml 4) Saxon report for XSD validation cf. https://www.saxonica.com/documentation/index.html#!functions/saxon/validate saxon.validation.report.xml
5) Other Cf. https://stackoverflow.com/questions/39974143/validate-xml-with-schema-and-get-validation-errors-in-xml
At the end I found the easiest way for my goal was to create this XVRL format.
XVRL was just a proposal, I invented quickly this syntax because of our need in my company. It can be improved, rename, delivered as open source, versionned etc. I can do that my company will really probably agree with this.
I think his reasoning is sound. I've looked at the other formats (when possible, the first was behind a login) and there's nothing that completely fits.
The only problem is that his proposal has no status whatsoever. But if we invented something ourselves the same problem would occur.
So I suggest:
The only thing I don't know is if this (referencing"just" some standard published on GitHub) is allowed given our W3C connection?
Thoughts?
I agree that SVRL is too much focused on Schematron. However, there is nothing in XSD/RNG/DTD validation outputs that couldn’t be squeezed into SVRL.
A couple of things that we rely on in SVRL are lacking in XVRL:
@role
. Examples include the ubiquitous @srcpath
attribute which is a location identifier that will be kept across multiple conversion steps (for ex. <span class="srcpath">file:/C:/cygwin/home/gerrit/Springer/docx2app-git/test_after/Drews_334495_1_En/M_0_004.docx.tmp/word/document.xml?xpath=/w:document[1]/w:body[1]/w:p[6]/w:r[12]</span>
) or classification (for ex. <span class="category">Typesetting</span>
, with span
as the SVRL span
element, not the HTML one.@role
which is often used to transport severity information, we’d prefer a severity
attribute with a fixed vocabulary (info
, warning
, error
, fatal-error
).srcpath
in such a custom attribute, @tr:srcpath
. Some attributes will pertain to individual localized messages, but most of them will relate to what is called report
in XVRL.srcpath
, category
and family
, where family
is a name for what is now the validation-report
element. We called it family because phase
was already taken. Maybe purpose
or kind
will also be ok for this. The idea is to attach a name to the validation in order to discern the validation report of an intermediate XML format from a validation of an EPUB OPF document, for example. The schema
element in the metadata for each validation-report
should already provide such a unique name, but only as a file name (system identifier). It might not be desciptive enough. Also, two schemas with different system identifiers may implement the same validation family. report
. report
elements should rather have category
children, where category
may carry @xml:lang
attributes. There may be multiple categories per report
and language.diagnostic
was originally intended, until it became used for L10N only). Supplementary information may hold the aforementioned tables, lists, or links to documentation. See screenshot below for an example.validation-report
, report
, and message
are most intuitive remains to be discussed.http://www.lefebvre-sarrut.eu/ns/els/xvrl
(something with xproc.org
in it).path
attribute should be optional.I propose that we as the XProc CG publish a modified spec (and give Matthieu due credit for the original spec). We can call it XVRL or GVRL (G for generalized).
Ehh, ok... Given the rather big list up there, is making such a spec something you or Le-tex can do? It wouldn't make sense for me to be just a scribe for somebody else's strong ideas about some subject...
And of course it can't wait very long (a few months?) to give the implementors enough time.
Yes, I can write the spec, a Relax NG schema and the SVRL→XVRL XSLT
Hi all
Thanks for the reporting and improvments! I like your proposal Gerrit, makes sens, and yes, feel free to make a new spec, xvrl was just a first attempt.
Maybe the new namespace should not ne bound to xproc in case one like to use it in another context ? but which organization then ? Well maybe it's easier to bound it to xproc after all !
Most of the validation engine use a kind a dictionnary to display error messages. Like i18n, it uses some id for "part of sentences". Don't know if it's a good idea to represent it in the new format ? Exemple : https://github.com/IDPF/epubcheck/tree/master/src/main/resources/com/thaiopensource This is specific to the schema language and maybe it should be computed before the final XVRL ?
If you need any help just tell me !
Cheers
Hello, @gimsieke directed me here and after consulting @sgmlguru and @Gertone also, would like to offer to pitch in. For reference, we also did something along these lines (but inadequate for these purposes) a while back. Not wishing to tread on any toes, @gimsieke, @mricaud - but I stand ready :)
Hi Andrew, by all means, please join us here. I will create a draft Schema (starting with a preliminary RNC that Norm created a couple weeks ago) in another repo over the weekend. Then we can discuss on, before, and after Thursday’s unconference track whether the proposed model seems adequate and what else people might need in “XVRL”.
Yes, please, @AndrewSales I'd welcome a coherent proposal. I've scratched at it a bit, but haven't produced anything I'm confident about.
See also: https://github.com/xproc/xvrl
@hrennau suggested that we also consider supporting SHACL validation (there’s issue #8 that already mentions SHACL), and then there needs to be a bidirectional mapping (if not an outright identity for overlapping areas covered) between the XVRL vocabulary and the SHACL Validation Report Vocabulary. The serialization format is then probably everything that an RDF graph can be serialized as.
There is also SHEX as an alternative to SHACL. Implementations are not at the level yet that everyone is extremely happy using them, so it might be pretty early days to implement them as a step. But worth looking into of course. But if we do so, we need to give whatever comes out of https://tools.ietf.org/pdf/draft-handrews-json-schema-00.pdf at least the same attention.
Is there anything more to say on this with regard to validating XML? Close?
Further discussion should take place on https://github.com/xproc/xvrl
From Mathieu’s email 2017-12-10 message:
reporting.zip