Closed xatapult closed 3 years ago
Hmm, not sure whether such a fringe case justifies changing the result to sequence="true"
.
Did I understand it correctly: the only case that you think is not clearly specified if the result does not contain a text node at all?
I think then the result should be a document node with no children. Empty text nodes are not allowed in a parent node. But a document node may well be empty.
I’d read the spec that the content type of this document-node-only document is still application/xml
or whatever the input document’s content type was.
Maybe you're right. Probably it's just a matter of explicitly mentioning and specifying this fringe case in the specs.
And probably Morgana is right, too. I don’t have it installed, but can you inspect more closely the content type and count(/node())
of the unwrapped document? It should be application/xml
and 0
, respectively.
Unwrapping an empty element becomes a text document without contents.
...per the editorial team meeting on 22 October 2020
I think I've opened a little can of worms with this. Even after the changes I already made, there are still some things left I think that should be specified or at least mentioned:
Gerrit mentions here that the serialization
attribute will be removed when the content type changes. There is however no mention of that in the step's description. And I'm wondering: Is that necessary? Why not simply retain it, any no longer relevant serialization options ar simply ignored, right? If we do want to remove it, it should be mentioned in the step's description.
I was also wondering what would be the outcome of <p:unwrap match="/*">
of:
<?Some processing instruction(s)?>
<!-- and/or some comment(s) -->
<an-empty-root-element/>
I think: A document node with the comments/processing-instructions as children and content-type unchanged (not making it text/plain
). So it would be not well-formed XML (which we already agreed upon that's ok).
The same is true in the example above when the root element just contains some text: The content-type will not change then.
The serialization
property will be removed because it is specified in § 3.1:
If a step changes the
content-type
in this way, it must also remove theserialization
property.
The code example: A document node with a PI, a comment, and also probably two whitespace-only text nodes after each. Unless some standard says that whitespace must be ignored outside of a top-level element, but I don’t think this is the case. Content-type unchanged, yes.
If the top-level element contained a text node, it will be merged with the whitespace nodes I guess. In any case, if there is a comment and/or PI, the document will retain its XML content type.
Thanks @gimsieke, missed that line in 3.1.
Glad we're in agreement about the other things 😉
I'm not sure exactly how to word it off the top of my head, but if there are comments or PIs then it has to remain an XML document. It only becomes a text document if the result of unwrapping is a single text node.
@ndw Yeah. And since it already was an XML document to begin with, we can leave the content-type
unchanged.
Recently I had the situation where a
p:unwrap
resulted in nothing:<p:unwrap match="/*">
on a document with just an empty root element like<some-root-element/>
.To my surprise (before reading the specs) in Morgana it resulted in an empty document node or (I'm not sure) a document with an empty (?) text node. This is correct I suppose although the spec does not make this explicit.
Can we make this situation more explicit and into something that makes more sense? I suggest to change the signature of the output port to
sequence="true"
and output empty in a case like described above.