Closed ndw closed 2 months ago
Close #543
Thank you, Gerrit! You're absolutely right. The step can return the original document unchanged if there was an error. That's much more sensible.
I think we need a document-element
option for the case where you send a text document as the source. For example:
<p:identity>
<p:with-input><p>Paragraph of text.</p></p:with-input>
</p:identity>
<p:validate-with-dtd
general-entities="map { 'text': 'Hello, world.',
'para': . }"
document-element="doc">
<p:with-input port="source">
<p:inline content-type="text/plain"><![CDATA[<doc>
<p>Test</p>
<p>&text;</p>
¶
</doc>]]></p:inline>
</p:with-input>
<p:with-input port="doctype"><p:empty/></p:with-input>
</p:validate-with-dtd>
Having poked at the implementation a bit, I think what I've proposed is way over-the-top. How about:
<p:declare-step type="p:validate-with-dtd">
<p:input port="source" primary="true" content-types="xml html text"/>
<p:input port="doctype" content-types="text" sequence="true">
<p:empty/>
</p:input>
<p:output port="result" primary="true" content-types="xml"/>
<p:output port="report" sequence="true" content-types="xml json"/>
<p:option name="report-format" select="'xvrl'" as="xs:string"/>
<p:option name="serialization" as="map(xs:QName,item()*)?"/>
<p:option name="assert-valid" select="true()" as="xs:boolean"/>
</p:declare-step>
doctype-system
serialization property (on the document or the step). We serialize the document with the necessary doctype declaration and validate it.doctype
, we serialize the source
document (without a doctype declaration or XML declaration), slap the doctype
you provided in front of it and validate it.Most probably missed something important, but I am confused what the report result port is for. If the validation succeeds, nothing “interesting” is in the documents on this port. If it doesn’t, the report document is not available because a dynamic error is raised. What do I miss?
Several comments back, @gimsieke persuaded me that we should put the assert-valid
option back and just pass the original document through if assert-valid
is false()
and an error occurs.
@ndw thanks. Now I know what I missed. :-))
@ndw Two questions came up, while trying to implement the new suggestion:
<p:declare-step type="p:validate-with-dtd">
<p:input port="source" primary="true" content-types="xml html text"/>
<p:input port="doctype" content-types="text" sequence="true">
<p:empty/>
</p:input>
<p:output port="result" primary="true" content-types="xml"/>
<p:output port="report" sequence="true" content-types="xml json"/>
<p:option name="report-format" select="'xvrl'" as="xs:string"/>
<p:option name="serialization" as="map(xs:QName,item()*)?"/>
<p:option name="assert-valid" select="true()" as="xs:boolean"/>
</p:declare-step>
´´´
Please excuse this questions, if they are stupid, but I am not a DTD-expert.
1. What is supposed to happen, if a Text document appears on port "source"?
2. Is an HTML document appears on port "source", is the result type "xml" correct?
A text document is allowed so that you could construct something like this:
<doc>
&chap1;
&chap2;
</doc>
where presumably the chap1
and chap2
entities are defined in the doctype
. There's no way to get unexpanded entities into a parsed XDM, so you'd have to do it this way. I haven't thought very hard about how difficult it will be to make a text document that serializes correctly!
DTD validation sort-of implies XML, so I think making the result always be XML makes sense. If you think it makes more sense to give a document with a root element of (X)HTML an HTML content type, I can see how that might make sense too.
@ndw Thank you!
Hi folks. I've pushed an update that simplifies the p:validate-with-dtd
step along the lines that I described in a comment above.
This my first attempt. Feedback eagerly solicited. Formatted versions should appear on the xproc.org/dashboard page a few minutes after I create this request.