qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

Control over schema validation in parse-xml(), doc(), etc. #490

Open michaelhkay opened 1 year ago

michaelhkay commented 1 year ago

I'm struggling with a problem with the stylesheet that generates QT4 tests from the examples in the function catalog, and I think it's an example of a more general problem in schema-aware processing.

The spec gives this example (for json-to-xml):

The expression json-to-xml('{"x": "\\", "y": "\u0025"}', map{'escape': true()}) returns 
(with whitespace added for legibility):

<map xmlns="http://www.w3.org/2005/xpath-functions">
  <string escaped="true" key="x">\\</string>
  <string key="y">%</string>
</map>

But the test we actually generate expects the result:

<map xmlns="http://www.w3.org/2005/xpath-functions">
    <string escaped="true" key="x" escaped-key="false">\\</string>
    <string key="y" escaped="false" escaped-key="false">%</string>
</map>

and the test is failing because the result produced by Saxon correctly excludes the escaped-key="false" attributes which the test is expecting. How did the attributes get there?

The answer is that the stylesheet is doing parse-xml() followed by some transformation to normalise whitespace, followed by serialize(). The parse-xml() call is invoking schema validation, which adds default attributes.

We probably don't want schema validation here; if we do want it, we probably don't want default attribute values to be expanded. But parse-xml() doesn't give us the choice. It says it's implementation-defined and it gives no options for the user to control it. Saxon provides configuration-level options but they aren't fine-grained enough to use here.

Without being able to control this, the only option seems to be for the stylesheet to transform the result to take out the defaulted attributes that the schema processor has added.

We need options on functions like doc() and parse-xml() to control whether and how schema validation is performed.

One of the options we need whenever we do validation is probably "validate+strip" - validate the input, report errors if it's invalid, but return the untyped data that was supplied to the validator, not the type-annotated data with expanded defaults.

ndw commented 1 year ago

Maybe the short term solution is to change the schema so that those values aren't default attributes and change the processing expection to be that absent values are treated as false?

michaelhkay commented 1 year ago

I've done a short term fix by adding a transformation pass to remove the unwanted attributes.