xproc / 3.0-steps

Repository for change requests to the standard step library and for official extension steps
10 stars 7 forks source link

What should validate-with-dtd do with text input if it fails validation? #601

Closed ndw closed 3 weeks ago

ndw commented 1 month ago

If you pass an XML or HTML document to p:validate-with-dtd with assert-valid=false, then the input document can simply be returned if validation fails. But what do you do if the input is text?

Off-hand, I can think of three options:

  1. Parse with a non-validating parser. Pro: produces some output. Con: We've just kicked the can down the road. Now what do we do if the document is not well-formed?
  2. Throw an error. Pro: easy. Con: sort of violates the expectations of assert-valid=false.
  3. Return nothing. Pro: easy. Con: Means the result port has to be a sequence and users have to check for empty.

Favorites? Other options?

I think I'm inclined to add a new error for this case and go with option 2. If you passed in a text document that you expected to be able to parse with a validating parser and validation fails, I have a hunch that it's going to be unlikely that it's well-formed XML. I'm guessing that the thing you got wrong was a broken declaration or something else that's just not going to parse.

If you really do need to handle the case where you constructed a text document that might be valid XML and might only be well-formed, you can handle it yourself, it's just going to be extra work.

ndw commented 1 month ago

In private correspondence, Achim points out that we could greatly simplify things by requiring the source to be XML or HTML. That leaves out how to handle the (small minority of) cases where constructing an internal subset is necessary. But, as Achim also points out, you could get around that with by constructing the text document and then passing it to p:cast-content-type.

I like it.

I'm going to update the validation step with this solution and see what folks think.