Do not overwrite Semantic XML content in Presentation XML

opoudjis commented 3 weeks ago

We currently have parallel Semantic XML and Presentation XML trees in Presentation XML. We also know that those trees will not always align, so that we can recover the Semantic XML for a given Presentation XML, because the Presentation XML layer often involves stripping Semantic XML content completely, and that includes unwrapping Semantic XML tags (e.g. <date value="ISO DATE"> is resolved into a string, with no indication there was ever a date wrapper there.

We are going to abandon that approach. Instead, we are going to take the approach indicated in <formattedref>:

If a rendering is to be provided for a Semantic XML element, that rendering is not to replace the Semantic XML element, but to be added as a child of the Semantic XML element, in a distinct tag
Renderers will by default render those elements as the contents of their formatted child element, and ignore the rest of the Semantic XML content
If renderers need access to the Semantic XML content, it is still there in the same element, and no parallel tree needs to be built
So we will no longer have a duplicate Semantic XML tree

Attention @Intelligent2013 @strogonoff

I will be doing this incrementally, one element at a time, and I will give you warning as I do; @Intelligent2013 I believe you will be the most impacted by the Presentation XML handling of terms.

opoudjis commented 3 weeks ago

Note: we will preserve Semantic XML, but we may not keep it in the same place in Presentation XML. So /term/domain will move to /term/definition/p, because that is where it is rendered. The point is that the information be recoverable, not that it be structurally identical.

opoudjis commented 2 weeks ago

This is going to be a high-level ticket, and the changes will be incremental and sub-tickets. Refining the approach given above and in https://github.com/metanorma/isodoc/issues/611:

We will no longer be replicating the Semantic XML tree in Presentation XML, in order to recover the Semantic XML.
We will no longer be removing Semantic XML tags and moving their content elsewhere, in order to render them ("unwrapping" tags).
- So <number value="3" fmt="precision=2"/> will not be rendered as 3.00 in Presentation XML, but as <number value="3" fmt="precision=2">3.00</number>.
If the Presentation XML rendering of a Semantic XML element is to be substantially different, including merging multiple Semantic XML elements in a single block, or prefixing content to Semantic XML elements, we will interleave Presentation XML elements with their Semantic XML counterparts.
The naming convention for such elements will be to prefix fmt- to the most applicable element name.
- Namespaces are simply not worth the effort, and developers hate them for a reason. They will not make processing easier for anyone.
Renderers will not be given a hidden attribute to instruct them to ignore Semantic XML elements in favour of Presentation XML elements. Instead they will just need to know to ignore them.
Semantic XML content within added Presentation XML tags will be crossreferenced back to the source Semantic XML tag. This will be done through a new semx tag, with the attributes element, containing the name of the Semantic XML tag, and target, referencing the id attribute of the Semantic XML. semx can contain either blocks or inline elements. Renderers are expected to ignore semx and render its children.

For example:

Semantic XML

<term>
...
<definition>A</definition>
<definition>B</definition>
<definition>C</definition>
</term>

Presentation XML

<term>
...
<definition id="a1">A</definition>
<definition id="a2">B</definition>
<definition id="a3">C</definition>
<fmt-definition>
<ol>
<li><semx element="definition" target="a1">A</semx><li>
<li><semx element="definition" target="a2">B</semx><li>
<li><semx element="definition" target="a3">C</semx><li>
</ol>
</fmt-definition>
</term>

Renderers will need to know to ignore definition, and process the contents of semx, just as Semantic XML extraction will need to know to ignore fmt-definition.

Autonumbering will be handled through new attributes, so that the number of an asset (@autonum), and its label (label), are differentiated from its caption (fmt-name, fmt-title; these will incorporate the name and title from the Semantic XML)
The label is what is used to cross-reference the asset by default, and it typically is the name of the asset class (e.g. Table) followed by its number. It should not be generated by the renderer on the fly, given flavour-specific formatting requirements and i18n complications (e.g. Japanese inserting connectives). However, it is available for reuse in the renderer (e.g. in Tables of Contents), and it can be overridden.
Delimiters inserted into the caption will also be tagged explicitly, as span class="autonum-delimiter", so that they can be identified and overridden if needed.

Semantic XML:

<table id="A">
<name>Rice yields per capita</name>

Current Presentation XML:

<table id="A">
<name>Table 3.1:&#xa0;Rice yields per capita</name>

Future Presentation XML:

<table id="A">
<name>Rice yields per capita</name>
<autonum id="A0">3.1</autonum>
<label id="A1">Table <semx element="autonum" target="A0">3.1</semx></label>
<fmt-name>
<semx element="label" target="A1">Table <semx element="autonum" target="A0">3.1</semx></semx>
<span class="autonum-delimiter">:&#xa0;</span>
<semx element="name" target="A2">Rice yields per capita</semx></fmt-name>

I need sign-off from @ronaldtse before proceeding with this: the asset captions alone will force rewriting a large number of test cases. @Intelligent2013 @strogonoff Please provide feedback also.

ronaldtse commented 2 weeks ago

@opoudjis I like this solution.

I think the <semx...> element does not need to be nested, because each <semx...> element is only rendered from a semantic XML element, i.e. it does not need nesting.
The <semx target="foo"> is generated by the foo element, so naming it target= is a bit strange. Maybe source="foo" works better?

metanorma / isodoc

Do not overwrite Semantic XML content in Presentation XML #610