Rewrite p:escape-markup and p:unescape-markup

ndw commented 4 years ago

This PR attempts to fix #14 but it does so in a radical way: I've entirely changed the semantics of both steps!

We need these to support, for example, JSON documents that have escaped HTML in string values. However, the complexity that @xatapult notes in issue 14 is a direct consequence of the XProc 1.0 requirement that the input and output had to remain XML even when escaping and unescaping markup. That's silly in XProc 3.0, so I've removed it. Escaping takes XML or HTML and produces text. Unescaping takes text and produces XML or HTML. In order to make that work in the general case, I had to add a wrapper option to p:unescape-markup, but I think that's consistent with what we've done in other places.

(If were inventing these steps now, we might call them p:parse and p:serialize or something, but I'm inclined to leave their names alone.)

I'd like at least two other editors to approve this before we merge it. And, obviously, if anyone objects I won't merge it until we've resolved the objections.

ndw commented 4 years ago

I don't think there is any difference going from XML to text.

Going from text to XML, the difference is the ability to handle results that would not be well formed XML (because they have multiple, top-level elements). I think that's important, though maybe it's only really important for text to HTML where it wouldn't necessarily be an error anyway.

I suppose I could be persuaded that cast-content-type adequately covers the cases where these steps are required and we should remove them both. But if we decide to keep one, I think we should keep both.

ndw commented 4 years ago

Per the 2 January 2020 editor's call, remove these steps: you can get the equivalent behavior with p:cast-content-type.

ndw commented 4 years ago

Overtaken by #329

xproc / 3.0-steps

Rewrite p:escape-markup and p:unescape-markup #313