xproc / 3.0-steps

Repository for change requests to the standard step library and for official extension steps
10 stars 7 forks source link

Inconsistent definition of text/plain output from some steps #566

Closed ndw closed 4 months ago

ndw commented 5 months ago

If you read the text of, for example, the p:uuid step, you'll find:

If the document node is matched, the entire document is replaced by a text node with the UUID. What appears on port result is a text document with the text node wrapped in a document node.

Given the input:

<doc>Hello, world.</doc>

If the step specifies match='/', the step is expected to produce a text document. Fair enough. But the Document properties section says:

If the resulting document contains exactly one text node, the content-type property is changed to text/plain and the serialization property is removed, while all other document properties are preserved

As far as I can see, match=doc also produces a document containing exactly one text node.

There are tests in the test suite that require match='doc' to produce an application/xml document containing a single text node where match='/' must produce a text/plain document containing a single text node.

It's quite practical to implement that, but I don't think the specification actually justifies that interpretation as currently drafted.

I don't think this is limited to the p:uuid step, I think it applies to p:hash, p:replace, p:string-replace, and perhaps others.

xatapult commented 5 months ago

I agree with you, given the explanation above. However, I wonder what @xml-project thinks of this, since he already implemented it. Maybe there are hidden detail monsters lurking somewhere?

ndw commented 5 months ago

I've implemented it as well 😄 . I think what happened is we made different, independent decisions about "was best" at different times. And they didn't line up. We must have decided that replacing doc with a text node didn't change the content type. And then at some other time, decided that replacing the document node should.

I'm torn: my instinct is that we should apply the rule in the document properties section consistently. But that's clearly going to introduce a backwards incompatibility that could bite users. The safer thing to do is to rewrite the document properties section so that it's consistent with what the test suite results indicate is correct behavior.

Consider:

<p:uuid match="/*"/>
<p:wrap match="/" wrapper="new-document-element/>

If we change the rule so that the output of p:uuid in this case is a text document, then the subsequent p:wrap step will fail. (The source has to be text or html.)

I think it's slightly odd that this will fail where the preceding will succeed:

<p:uuid match="/"/>
<p:wrap match="/" wrapper="new-document-element/>

But it can be explained.

xml-project commented 5 months ago

Late to the party, sorry. For p:uuid I understood

If the document node is matched, the entire document is replaced by a text node with the UUID. What appears on port result is a text document with the text node wrapped in a document node.

just as an explanation what happes, if the document node is matched, not as stating, that the switch to a text document is ONLY used in this case. (The paragraphs before are what happens if an element or an attribute is matched. As a consequence MorganaXProc-III produces a text document for

<p:uuid match="/">
    <p:with-input><doc>some text</doc></p:with-input>
</p:uuid>

and for

<p:uuid match="doc">
    <p:with-input><doc>some text</doc></p:with-input>
</p:uuid>

So I do think there is a reading which makes the definition consistent. Does this make sense? @ndw Which test expects 'application/xml'? I could not find one for p:uuid. Could you point me there?

ndw commented 5 months ago

I will look again. I'm pretty sure I just stumbled over a few days ago.

ndw commented 5 months ago

(It may not have been p:uuid where I saw this in the test suite. I was looking closely at p:uuid when I wrote this issue because ULIDs came up in the QT4CG working group and I was amusing myself by considering whether I could support them inp:uuid ("no", not without doing violence to the step semantics). I happened to see the text and was reminded of the issue. Should have reported it in the test suite, except there I was just trying to pass the test :-) )

ndw commented 5 months ago

Apologies, I can no longer reproduce the problem that lead me to conclude there were differences in the test suite. And when I remove the special-case code to handle that case...it still works. One bug obscuring another I guess. Apologies for the noise.

I'm going to leave this open for a bit because I think we might be able to clarify the prose.