xproc / xvrl

Extensible Validation Reporting Language
11 stars 4 forks source link

Validation might not be binary, might not be full #12

Open tgraham-antenna opened 5 years ago

tgraham-antenna commented 5 years ago

I've never had the pleasure of having to use the multifarious concepts, but W3C XSD (https://www.w3.org/TR/xmlschema11-1/#sec-schema-validity-and-docs) has three notions of schema validity. Part of how it manages three types of validity is because the validation attempted on different parts of the document can be 'full' or 'partial'.

400e55e makes valid optional (which is good, e.g., for parsing for well-formedness) but, ironically, it's still an xsd:boolean.

gimsieke commented 5 years ago

I retained true/false and added partial

AndrewSales commented 5 years ago

Would this also cover off jing's notion of feasible validity to a RELAX NG schema?

tgraham-antenna commented 5 years ago

Works for me (though you just excluded the 0 and 1 literals from the xs:boolean lexical space: https://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/datatypes.html#boolean).

tgraham-antenna commented 5 years ago

Sorry, didn't see @AndrewSales's comment before closing this. Please reopen if it doesn't work for you.

gimsieke commented 5 years ago

@AndrewSales Hm, I was unable to assess whether feasibly valid docs in RNG and partially valid docs in XSD are similar concepts. Let’s assume they aren’t similar concepts (in order to be able to extinguish hair-splitting debates from the start) but let’s use the same validity value for them nevertheless. We can document it like that. And if someone insist they are sufficiently different concepts we might finally allow a value feasible (that no one will use in practice anyway).

Another question is how to map an SVRL report with warning as its highest severity level to our validity vocabulary. This is underspecified in XProc’s p:validate-with-schematron, too. An XProc processor, if assert-valid="true", will fail at the slightest failed-assert or successful-report, even if the @role was something harmless such as info or warning. This is allowed to happen partly because there is no severity semantics in SVRL, @role is only tacitly and commonly misused for carrying severity information. But I think there is also no consensus whether the presence of a warning means “invalid” or “still valid, but…”. Validity in these cases is in the eye of the beholder I guess. As a proponent of a controlled vocabulary for severity levels (cf #2), I’m inclined to say that if the SVRL reports an error or a fatal error (as per its role→severity mapping), the whole thing is invalid, otherwise it is valid. Or is anyone (who accepts the notion of a fixed severity vocabulary in the first place) in favour of mapping reports with warnings as the most severe findings to valid="partial" in the digest? Or should we call it valid="maybe" and use it for all fuzzy situations?

AndrewSales commented 5 years ago

Thanks, @gimsieke - I wasn't able to gauge the feasibly/partially distinction either, so mine was an open question really, in case anyone could/did distinguish :-) So I think your approach is entirely reasonable here.

On your other point, I suppose, to extend the semantics expressed in the XML 1.0 Rec means that valid would then effectively mean "no fatal errors or errors (and zero or more warnings or other messages whose severities are application-specific or user-defined)".

As a user whose reports are unlikely ever to be warning-free, I am happy with that, because I know that the validity devil is in the report detail, and will always examine those entrails. For those that aren't, could it be configured at user option to provide as you suggest valid='partial'?

dmj commented 5 years ago

An XProc processor, if assert-valid="true", will fail at the slightest failed-assert or successful-report, even if the @role was something harmless such as info or warning.

And it should do so because validity is defined by ISO Schematron in section 3.25: A document is valid with respect to the schema if no assertion tests in fired rules of active patterns fail.

What one could do is to apply postprocessing to the validation report and remove patterns/properties/assertions with specific roles and/or provide a user-defined function

fn:is-valid ($svrl as element(svrl:schematron-output)) as xs:boolean

that decides whether a given Schematron validation result is considered to be valid or not.

https://github.com/Schematron/schematron/issues/25 discusses a related concept.

gimsieke commented 5 years ago

Ok, I will leave the valid attribute values at true|false|partial for the time being and we can add parameters to the future SVRL→XVRL XSLT for influencing a) how severity will be calculated from @role, @flag, etc. and b) how @valid will be calculated from the highest severity level.