xproc / 3.0-steps

Repository for change requests to the standard step library and for official extension steps
10 stars 7 forks source link

Be consistent about turning XML into text #485

Closed ndw closed 2 years ago

ndw commented 3 years ago

We decided that if p:unwrap results in an empty document, then it's text (#470). We should say the same thing about p:delete; are there other case?

ndw commented 3 years ago

Is this even a good idea? Suppose p:unwrap is followed by something that adds an element. That would have succeeded and will now fail.

gimsieke commented 3 years ago

I don’t remember the rationale for turning it into an empty text document. Maybe we should revise it an make an empty document node, as Morgana did before IIRC. Apart from p:delete, I don’t know any other steps. Maybe p:xslt if it is now allowed to “create” empty primary outputs? Or p:xquery?

xml-project commented 2 years ago

FYI: Today I added some related tests to the test suite (https://github.com/xproc/3.0-test-suite/pull/597).

I was able to identify nine cases, where a steps result could be an empty document node:

  1. p:unwrap -> Text document
  2. p:delete -> XML document
  3. p:string-replace -> XML document
  4. p:text-replace -> Text document
  5. p:xquery -> XML document
  6. p:xslt-secondary port -> XML document
  7. p:xslt-result port -> XML document
  8. p:load (for an empty file with text/plain) -> Text document
  9. p:viewport replacing matched root element with p:empty) -> XML document

I have no solution to propose, but I agree with @ndw that p:unwrap and p:delete (and p:string-replace) should result in the same document type. My expectations (coming from xquery and xslt) would be an XML document, but I think I could adapt to Text document too.

What seems natural to me is that loading an empty text file and replacing all text in a text document does not change the document type, but I would not argue for it, if you favour a general solution saying all empty document nodes result in an X document.

Hope that helps.

xml-project commented 2 years ago

There a two other cases to produce an empty document node: 1.

<p:identity>
  <p:with-input><p:inline /></p:with-input>
</p:identity>

I would expect this to be an XML document.

2.

<p:identity>
  <p:with-input content-type="text/plain"><p:inline /></p:with-input>
</p:identity>

I would expect this to be a Text document.

gimsieke commented 2 years ago

agree

xml-project commented 2 years ago

To sum up the discussion, I think two options to solve is are mentioned:

  1. Keep the change to p:unwrap (produces text document in some cases), but to change p:delete, so it produces a text document in the same cases)
  2. Revert the change to p:unwrap, so it is an XML document event if it is an empty document node.

If you indicate your preferred solution, I am prepared to file a pull request for this.

If you would ask me, (2) seems more natural to me. In a pipeline I lately run into the problem indicated by @ndw in https://github.com/xproc/3.0-steps/issues/485#issuecomment-916197623

xatapult commented 2 years ago

It feels much more naturally to me that p:unwrap and all the other steps produce an empty document node in case there's nothing there any more.

@ndw and @gimsieke Please add your votes so we can hopefully close this issue (or, in case of disagreements, discuss it)

gimsieke commented 2 years ago

I’m quite indifferent. If you both tend to option two, I’ll tend to two, too.

ndw commented 2 years ago

On the call of 22 June, the editor's decided to revert the previous decision and make the results be empty XDM document nodes.