metanorma / isodoc

Generate HTML/Word from Metanorma XML
https://www.metanorma.org
BSD 2-Clause "Simplified" License
4 stars 3 forks source link

Do not overwrite Semantic XML content in Presentation XML #610

Open opoudjis opened 3 weeks ago

opoudjis commented 3 weeks ago

We currently have parallel Semantic XML and Presentation XML trees in Presentation XML. We also know that those trees will not always align, so that we can recover the Semantic XML for a given Presentation XML, because the Presentation XML layer often involves stripping Semantic XML content completely, and that includes unwrapping Semantic XML tags (e.g. <date value="ISO DATE"> is resolved into a string, with no indication there was ever a date wrapper there.

We are going to abandon that approach. Instead, we are going to take the approach indicated in <formattedref>:

Attention @Intelligent2013 @strogonoff

I will be doing this incrementally, one element at a time, and I will give you warning as I do; @Intelligent2013 I believe you will be the most impacted by the Presentation XML handling of terms.

opoudjis commented 3 weeks ago

Note: we will preserve Semantic XML, but we may not keep it in the same place in Presentation XML. So /term/domain will move to /term/definition/p, because that is where it is rendered. The point is that the information be recoverable, not that it be structurally identical.

opoudjis commented 2 weeks ago

This is going to be a high-level ticket, and the changes will be incremental and sub-tickets. Refining the approach given above and in https://github.com/metanorma/isodoc/issues/611:

For example:

Semantic XML

<term>
...
<definition>A</definition>
<definition>B</definition>
<definition>C</definition>
</term>

Presentation XML

<term>
...
<definition id="a1">A</definition>
<definition id="a2">B</definition>
<definition id="a3">C</definition>
<fmt-definition>
<ol>
<li><semx element="definition" target="a1">A</semx><li>
<li><semx element="definition" target="a2">B</semx><li>
<li><semx element="definition" target="a3">C</semx><li>
</ol>
</fmt-definition>
</term>

Renderers will need to know to ignore definition, and process the contents of semx, just as Semantic XML extraction will need to know to ignore fmt-definition.

Semantic XML:

<table id="A">
<name>Rice yields per capita</name>

Current Presentation XML:

<table id="A">
<name>Table 3.1:&#xa0;Rice yields per capita</name>

Future Presentation XML:

<table id="A">
<name>Rice yields per capita</name>
<autonum id="A0">3.1</autonum>
<label id="A1">Table <semx element="autonum" target="A0">3.1</semx></label>
<fmt-name>
<semx element="label" target="A1">Table <semx element="autonum" target="A0">3.1</semx></semx>
<span class="autonum-delimiter">:&#xa0;</span>
<semx element="name" target="A2">Rice yields per capita</semx></fmt-name>

I need sign-off from @ronaldtse before proceeding with this: the asset captions alone will force rewriting a large number of test cases. @Intelligent2013 @strogonoff Please provide feedback also.

ronaldtse commented 2 weeks ago

@opoudjis I like this solution.

  1. I think the <semx...> element does not need to be nested, because each <semx...> element is only rendered from a semantic XML element, i.e. it does not need nesting.
  2. The <semx target="foo"> is generated by the foo element, so naming it target= is a bit strange. Maybe source="foo" works better?