metanorma / isodoc

Generate HTML/Word from Metanorma XML
https://www.metanorma.org
BSD 2-Clause "Simplified" License
4 stars 3 forks source link

Move presentation functionality for blocks to Presentation XML #606

Closed opoudjis closed 2 weeks ago

opoudjis commented 3 weeks ago

There is some residual presentation functionality still being done in isodoc/function and in PDF XSLT, that should instead be done in Presentation XML: the HTML rendering needs to be as thin as possible, given the disruptive changes foreseen for it. For instance, the caption of figure keys is being generated in isodoc/function (so in HTML, DOC, PDF); it should be present already in Presentation XML.

opoudjis commented 3 weeks ago

I am adding in Presentation XML //figure/dl/p[@keep-with-next='true'] with i18n'd content "Key"

I am moving footnotes of figures into the figure key.

In the case of BSI, I am instead moving footnotes into paragraphs at the end of the figure.

opoudjis commented 3 weeks ago

BSI, I see that I have implemented the wrong behaviour for BSI, and they are just normal footnotes within the figure, like for tables. Will fix.

opoudjis commented 3 weeks ago

Now getting expected behaviour in BSI.

Please keep ticket open, there will be more.

opoudjis commented 3 weeks ago

Formulae in Presentation XML are still

<formula id="A">
<name>4</name>
<math/>
</formula>

They are rendered in HTML as

<div id="A">
<div class="formula">
<p><math/>&nbsp;(4)</p>
</div>
</div>

This does not match the usual Presentation XML pattern of <title>4.<tab/>(content)</title>. I'm OK to leave the paragraph wrapping and the following label to renderer-specific processing, because of the awkwardness of it. But the wrapping of the formula label in parentheses should be done in Presentation XML: if a flavour ever needs to override it, that should not be happening in the HTML or PDF layer, that should be set as Presentation content.

opoudjis commented 3 weeks ago

The term domain is injected at the start of the term definition, as "<#{domain}>". This is being done in HTML, as legacy that I never got around to undoing: it is of course necessarily a Presentation XML concern.

There will be no need for HTML and PDF to inject the domain into the term definition and wrap it in angle brackets: that will be done in Presentation XML.

So:

Semantic XML:

<term><preferred>natural</preferred>
<domain>mathematics</domain>
<definition><p>sequence of integers</p></definition>
</term>

currently HTML and PDF inject "<mathematics>" at the start of the first definition paragraph. That will now be done in Presentation XML:

<term><preferred>natural</preferred>
<definition><p>&lt;<domain>mathematics</term>&gt; sequence of integers</p></definition>
</term>

And I'm leaving the "domain" tag in, anticipating https://github.com/metanorma/isodoc/issues/610: Presentation XML will preserve Semantic XML markup. PDF and HTML will just ignore the tag and render its contents.

opoudjis commented 3 weeks ago

Complication: in IEC, IEV documents suppress display of domain. We will set @hidden=true on domain in that context.

opoudjis commented 3 weeks ago

This may well not have been used ever, but we generate a quotation attribution paragraph in HTML:

<quote id="_">
        <source type="inline" bibitemid="ISO7301" citeas="ISO 7301:2011"><locality type="clause"><referenceFrom>1</referenceFrom></locality>ISO&#xa0;7301:2011, Clause 1</source>
        <author>ISO</author>
        <p id="_">This International Standard gives the minimum specifications for rice (<em>Oryza sativa</em> L.) which is subject to international trade. It is applicable to the following types: husked rice and milled rice, parboiled or not, intended for direct human consumption. It is neither applicable to other products derived from rice, nor to waxy rice (glutinous rice).</p>
</quote>

to

<div class="Quote" id="_">
        <p id="_">This International Standard gives the minimum specifications for rice (<i>Oryza sativa</i> L.) which is subject to international trade. It is applicable to the following types: husked rice and milled rice, parboiled or not, intended for direct human consumption. It is neither applicable to other products derived from rice, nor to waxy rice (glutinous rice).</p>
      <p class="QuoteAttribution">&#8212; ISO, ISO&#xa0;7301:2011, Clause 1</p></div>
</div>

This needs to be done in Presentation XML:

<quote id="_">
        <p id="_">This International Standard gives the minimum specifications for rice (<em>Oryza sativa</em> L.) which is subject to international trade. It is applicable to the following types: husked rice and milled rice, parboiled or not, intended for direct human consumption. It is neither applicable to other products derived from rice, nor to waxy rice (glutinous rice).</p>
      <attribution><p>&#8212; <author>ISO</author>,  <eref type="inline" bibitemid="ISO7301" citeas="ISO 7301:2011"><locality type="clause"><referenceFrom>1</referenceFrom></locality>ISO&#xa0;7301:2011, Clause 1</eref></p></attribution>
      </quote>

As above, the author wrapper is ignored; the source wrapper is converted to eref—there is no good reason to maintain the distinction. The attribution wrapper may be used for styling, but is otherwise ignored.

opoudjis commented 3 weeks ago

Make dl Presentation XML processing generic to Metanorma; it is defined in two flavours.

opoudjis commented 3 weeks ago

IEEE and ITU defined in HTML code the delimiter between the note label (NOTE, NOTE 1) and the note contents. That's an antipattern: this was supposed to be resolved in Presentation XML. The experiment around how to do so with OGC led to the unfortunate issue of https://github.com/metanorma/isodoc/issues/609, and will instead lead to explicit markup of delimiters in Presentation XML in https://github.com/metanorma/isodoc/issues/611

So in all of IEEE, ITU, OGC, we have currently:

Semantic XML:

<note id="note1">
        <p id="_f06fd0d1-a203-4f3d-a515-0bdba0f8d83f">First note.</p>
      </note>

Presentation XML:

<note id='note1'>
              <name>NOTE 1</name>
              <p id='_'>First note.</p>
            </note>

HTML:

 <div id='note1' class='Note'>
              <p><span class='note_label'>NOTE 1&#x2014;</span>First note.</p>
            </div>

Instead, we will have the Presentation XML be:

<note id='note1'>
              <name>NOTE 1&#x2014;</name>
              <p id='_'>First note.</p>
            </note>

Ultimately but not yet, that will be:

<note id='note1'>
              <name>NOTE <autonum>1</autonum><autonum-delim>&#x2014;</autonum-delim></name>
              <p id='_'>First note.</p>
            </note>

In the case of ITU, it is &#xa0;&#x2013;&#xa0;, but only if the note starts with a paragraph.

So PDF and HTML should not injecting the em-dash in those formats.

Moving the label inside the paragraph for notes is NOT currently being done in Presentation XML. I may do so in the future, but this iteration is for preventing content being added in HTML and PDF, not necessarily for rearranging content.

opoudjis commented 3 weeks ago

Because of the large number of changes, I'm going to stop here, and start a new ticket and PR for further Presentation XML refactoring.

opoudjis commented 3 weeks ago

Reopening, there's just 4 more changes in lib/isodoc/function

opoudjis commented 3 weeks ago

docid is converted to a Presentation from through the method docid_l10n(). This code lives in generic isodoc, and it is invoked by all of Presentation XML and xrefs (which generates cross-reference text for Presentation XML).

For legacy reasons, I've never moved this code out of the HTML module, but in fact it is never used by HTML (although you need to follow the chain of calling to work that out). Moving all this code to Presentation XML, and making the call to xrefs internal to Presentation XML constitutes a needed code cleanup.

This change does not impact output, so it will be left last of the 4.

opoudjis commented 3 weeks ago

HTML used to add missing titles instead of letting Presentation XML do so. There are two instances where it still does: Symbols and Forewords.

These will be populated if missing in Presentation XML: there is no need for PDF or HTML to supply missing title text here.

opoudjis commented 3 weeks ago

As with notes, the insertion of the delimiter for term notes needs to be moved to Presentation XML. So you will now be receiving:

               <termnote id="_" keep-with-next="true" keep-lines-together="true">
                <name>Note 1 to entry:</name>
                 <p id="_">The starch of waxy rice consists almost entirely of amylopectin. The kernels have a tendency to stick together after cooking.</p>
               </termnote>

And you do not need to insert the : yourself.

opoudjis commented 2 weeks ago

This issue has unearthed several internal inconsistencies in handling of delimiters in particular (including wrong use of ISO-inherited ":" after termnote labels in BSI).

opoudjis commented 2 weeks ago

For the refactoring of @xrefs and of reference processing: the base class that @xrefs takes its isodoc processign functionality from (including refs processing) needs to be Presentation XML, not HTML.

opoudjis commented 2 weeks ago

To my astonishment, the refactoring of @xrefs turned out to be straightforward.

opoudjis commented 2 weeks ago

Before and after:

isodoc/lib/isodoc/function: (= HTML, DOC): 2316 > 2119 lines isodoc/lib/isodoc/presentation: 2505 > 2594 lines

opoudjis commented 2 weeks ago

The refactoring will continue in the individual flavours, but there will be a lot less to catch there. Right now, I am going through the ramifications of the @xrefs refactor.

opoudjis commented 2 weeks ago

In IEEE the domain is printed in the preferred designation, not the definition. I am not wrapping it in <domain> there, because I will end up wrapping both it and the field of application in the designation with span instead, or whatever less intrusive element I devise.

opoudjis commented 2 weeks ago

In fact, in ISO after xrefs refactoring moving all docidentifier processing to Presentation XML, what is happening is that the standards annotation is being applied more consistently than before, including in biblio-tag (i.e. as the identifiers references listings). For now, I am going to treat this as a feature not a bug.

I've confirmed with ISO 6709 that those styles are intended to be applied to biblio-tag after all. So this is a bugfix.

opoudjis commented 2 weeks ago

In refactor, had moved docid_prefix to PresentationXML from HTML. Moving it to Init, in order to make it accessible throughout isodoc (and standoc): https://github.com/metanorma/isodoc/issues/606

opoudjis commented 2 weeks ago

xrefs refactoring has messed up: xrefs downstream invokes what are now both PresentationXML and HTML methods after all. Will need to rearrange inheritance, and may need metaprogramming.

opoudjis commented 2 weeks ago

xrefs refactor done. IEC, BSI now correctly rendering "(all parts)" after citations.

opoudjis commented 2 weeks ago

I am now going to check for presentation functionality in individual flavour gems, but I am not going to action it unless I know that @Intelligent2013 will have time to do so before next Monday.

Intelligent2013 commented 2 weeks ago

@opoudjis I'm going to end the tasks in https://github.com/metanorma/isodoc/issues/607 today. Do you need another updates?

opoudjis commented 2 weeks ago
opoudjis commented 2 weeks ago

So, breaking this down:

Leaving the Plateau ideographic text alone. Leaving BIPM and ISO indexes alone.

Need analysis on these from @Intelligent2013, of how many of these PDF is already doing independently. Posting to https://github.com/metanorma/isodoc/issues/607

opoudjis commented 2 weeks ago

Will raise to separate issue. This issue is concluded.