metanorma / metanorma-bipm

Metanorma for BIPM documents
BSD 2-Clause "Simplified" License
2 stars 3 forks source link

Create BIPM flavor, support document type "brochure" #1

Closed ronaldtse closed 4 years ago

ronaldtse commented 4 years ago

Based on the SI Brochure layout: https://www.bipm.org/utils/common/pdf/si-brochure/SI-Brochure-9-EN.pdf

ronaldtse commented 4 years ago

Modeling completed at https://github.com/metanorma/metanorma-model-bipm

opoudjis commented 4 years ago

Introducing configurable abbreviations of organisations in standoc

opoudjis commented 4 years ago

Introduce configurable validation of committee values

opoudjis commented 4 years ago

For internationalisation, need to allow configuration to be hash of languages to files

opoudjis commented 4 years ago

Genuine internationalisation includes internationalisation of the display values of bibdata, but the original English values need to maintained for logic. Will need to allow translation of bibdata in presentation XML, and extract both original and display values for metadata.

opoudjis commented 4 years ago

The i18n values (display values) for bibdata are in the local_bibdata element, which duplicates bibdata, but internationalises docstage, docsubstage, and doctype. @Intelligent2013 This is an update to Presentation XML.

opoudjis commented 4 years ago

The boilerplate also needs to be localised

opoudjis commented 4 years ago

Markup of the SI brochure needs to address the fact that included documents aren't being recognised as appendixes.

Intelligent2013 commented 4 years ago

I'll put here a differences between original PDF and source metanorma XML data.

  1. On 3rd page in PDF there are a titles with (SI) addon: изображение

but in XML there isn't (SI) in title:

<title language="en" format="text/plain" type="main">The International System of Units</title>
<title language="fr" format="text/plain" type="main">Le Système international d’unités</title>

On other pages in PDF also there isn't (SI), see cover page and 8th page: изображение

  1. On 5th page in PDF: изображение

In XML:

<contributor>
<role type="author"/>
<organization>
<name>Bureau International de Poids et Mesures</name>
<abbreviation>BIPM</abbreviation>
</organization>
</contributor>
<contributor>
<role type="publisher"/>
<organization>
<name>Bureau International de Poids et Mesures</name>
<abbreviation>BIPM</abbreviation>
</organization>
</contributor>

There is a difference between caps and 'des' vs. 'de'.

  1. On 5th page in PDF specified version number v1.07: изображение In xml there isn't.

  2. In PDF, there is an indent between table number and table name: изображение

If we need to reproduce it in resulted PDF, then we need to separate them somehow in xml - add <tab/> or split them in different tags/attributes. Current source xml:

<name>Tableau 1 — Les sept constantes définissant le SI et les sept unités qu’elles définissent</name>
  1. In xml there isn't data for: изображение

(6th page in PDF)

opoudjis commented 4 years ago

(1) Added as /bibdata/title[@type = 'cover']

(2) Fixed

(3) Added to source as /bibdata/version/draft

(4) Fixed

(5) Added to /boilerplate/license-statement, but genericised to "This document" (the "SI Brochure" title is not in fact the title of the document, and I see little justification to perpetuating idiosyncrasy.)

opoudjis commented 4 years ago

Appendixes are numbered with arabic numerals, not letters.

Intelligent2013 commented 4 years ago

@opoudjis thank you.

  1. In source PDF before the Contents pages there is a section: изображение

and after the Content pages there is a section:

изображение

but in source XML there sections placed all together in /preface/abstract. Here is example - the latest p for section 'before' Contents and the first 'p' for section 'after' Contents:

...
<p id="_8d11cbfb-90ac-499d-b38d-e9462181ac0f">Depuis 1965 la revue internationale <em>Metrologia</em>, éditée sous les auspices du Comité
international des poids et mesures, publie des articles sur la métrologie scientifique,
l’amélioration des méthodes de mesure, les travaux sur les étalons et sur les unités,
ainsi que des rapports concernant les activités, les décisions et les recommandations des
organes de la Convention du Mètre.</p>
<p id="_33d9c195-327b-4de1-96d2-ade72469a110">Depuis son établissement en 1960 par une résolution adoptée par la Conférence générale
des poids et mesures (CGPM) à sa 11<sup>e</sup> réunion, le Système international d’unités (SI) est
utilisé dans le monde entier comme le système préféré d’unités et comme le langage
fondamental de la science, de la technologie, de l’industrie et du commerce.</p>
...
Intelligent2013 commented 4 years ago
  1. In the source PDF, sometimes the list numbering started not from 1:

изображение

but in xml there isn't such meta data:

<ol id="_3286792b-2ebf-4a60-a9a6-87ae3f960de8" type="arabic">
<li>
<p id="_787bf896-9427-4a4c-abd3-658446f35be4">Les unités photométriques peuvent être définies comme suit:</p>
<dl id="_f275db51-1135-4e51-aac3-7bb4f253be1c">
<dt><strong><em>Bougie nouvelle</em></strong> (unité d’intensité lumineuse).</dt>
<dd>
<p id="_f99e6a8b-8747-4232-846b-0529748b3e86">La grandeur de la bougie nouvelle est telle
que la brillance du radiateur intégral à la température de solidification du platine soit de
60 bougies nouvelles par centimètre carré.</p>
</dd>
<dt><strong><em>Lumen nouveau</em></strong> (unité de flux lumineux).</dt>
<dd>
<p id="_0fd9e1d0-86d7-44e9-ab86-8ff4178fb784">Le lumen nouveau est le flux lumineux émis dans
l’angle solide unité (stéradian) par une source ponctuelle uniforme ayant une intensité
lumineuse de 1 bougie nouvelle.</p>
</dd>
</dl>
</li>
<li>
<p id="_69d002e9-366e-43b6-a14c-ba604baa8513">. . .</p>
</li>
</ol>

therefore in resulted PDF we have: изображение

I see two possible solutions:

  1. Put full list item text in simple p like this:

    <p>4. Les unités photométriques peuvent être définies comme suit :</p>

    or

  2. Add an attibute start for ol like this:

    <ol id="_3286792b-2ebf-4a60-a9a6-87ae3f960de8" type="arabic" start="4">
    <li>
    <p id="_787bf896-9427-4a4c-abd3-658446f35be4">Les unités photométriques peuvent être définies comme suit:</p>
opoudjis commented 4 years ago

(6) Fixed markup, differentiated preface as prefatory clause.

opoudjis commented 4 years ago

(7) This will be a separate ticket: https://github.com/metanorma/metanorma-standoc/issues/349

opoudjis commented 4 years ago

Not proceeding with (7), changed markup.

opoudjis commented 4 years ago

metadata_extensions config in YAML needs to permit nested elements:

:comment-period-from:
:comment-period-to:
:comment-period-type:
:reply-to:
:security:
<comment-period type="">
  <from></from>
  <from></from>
  <from></from>
  <to></to>
  <reply-to></reply-to>
</comment-period>
<security></security>
metadata_extensions:
  comment-period:
    comment-period-type: 
        _output: type
        _attribute: true
    comment-period-from: 
        _output: from
        _list: true
    comment-period-to: 
         _output: to
    reply-to:
  security:

Container elements may not have hash attributes (attribute = true or list = true or different output names).

Lists are assumed to be in CSV (i.e. quotes override commas)

This is a change from the existing format, where extensions are lists.

opoudjis commented 4 years ago

Am needing to deal with this in isodoc/metadata, so will output /bibdata/ext/ to a Hash: https://stackoverflow.com/a/10144623

Intelligent2013 commented 4 years ago
  1. Annex I in source PDF contains a Table des matières de l’annexe 1. I don't figure out how to display it in PDF. Looks like should be added some additional meta-information into XML. изображение

8.1. The items group by catetegory. 8.2. The title contains only part of title from document body: From table of contents: изображение From document body: изображение 8.3. Source xml contains 2nd and 3rd level section's numbers. Should I ignore it and show 'quad' character instead of 3rd level number? For example, original PDF: изображение

current resulted PDF: изображение

  1. Some text on the page sides in Annexes looks as 'footnotes, i.e. there is 'mark' like '' in the text body and the notes starts with the mark also, see `and**`: изображение

But in the source XML it determined as notes:

<ol id="_ef59d0ec-5c54-4114-9f96-d0c10c17b50c" type="arabic">
<li>
<p id="_42ac29b0-1e59-4747-acb6-821578b2d679">Le kilogramme est l’unité de masse; il est égal à la masse du prototype international du
kilogramme;</p>
<note id="_eb31e25d-43c8-4d2d-be6a-aa6edef8b32e"><name>NOTE  1</name>
<p id="_56f20fe6-057f-4b2c-8222-d5fa162299d6">Définition abrogée en 2018 par la CGPM
à sa 26<sup>e</sup> réunion (Résolution 1, <em>voir</em> p.92).</p>
</note>
</li>
<li>
<p id="_2b2a73c0-8cdd-448c-a5f0-2afa54914a6d">Le terme poids désigne une grandeur de la même nature qu’une force; le poids d’un corps
est le produit de la masse de ce corps par l’accélération de la pesanteur;
en particulier, le poids normal d’un corps est le produit de la masse de ce corps par
l’accélération normale de la pesanteur;</p>
</li>
<li>
<p id="_32f0948f-9ccf-4165-93c1-809a114a6e98">Le nombre adopté dans le Service international des Poids et Mesures pour la valeur de
l’accélération normale de la pesanteur est <stem type="MathML"><math xmlns="http://www.w3.org/1998/Math/MathML"><mn>980</mn><mi>,</mi><mn>665</mn></math></stem> <stem type="MathML"><math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow><mtext>cm/s</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></stem>, nombre sanctionné déjà par
quelques législations.</p>
<note id="_79d8da3a-e3e1-47e3-8ef7-723ca172159f"><name>NOTE  2</name>
<p id="_cd6025e3-e374-4a14-ba84-218ed5af13e3">Cette valeur de <stem type="MathML"><math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow><mi>g</mi></mrow><mrow><mtext>n</mtext></mrow></msub></math></stem> est la valeur conventionnelle de
référence pour le calcul de l’unité kilogramme-force
maintenant abolie.</p>
</note>
</li>
</ol>
Intelligent2013 commented 4 years ago
  1. In source PDF there is a text Remarque: изображение which coded in XML as note:
    <li>
    <p id="_80b89e67-6dc9-425a-96aa-d27928740749">mettre à jour la fréquence de la transition suivante dans la liste des fréquences étalons
    recommandées et l’approuver comme représentation secondaire de la seconde:</p>
    <ul id="_85f8ccf8-fb70-47f5-bee9-7a6e40b32324">
    <li>
    <p id="_1ede75c4-875f-409f-9119-3013b050d10c">la transition quantique hyperfine non perturbée de l’état fondamental de l’atome de
    <sup>87</sup>Rb, à la fréquence de <stem type="MathML"><math xmlns="http://www.w3.org/1998/Math/MathML"><mn>6</mn><mtext> </mtext><mn>834</mn><mtext> </mtext><mn>682</mn><mtext> </mtext><mn>610.904</mn><mtext> </mtext><mn>312</mn><mtext> Hz</mtext></math></stem> avec une incertitude-type
    relative estimée de <stem type="MathML"><math xmlns="http://www.w3.org/1998/Math/MathML"><mn>1</mn><mi>,</mi><mn>3</mn><mo>×</mo><msup><mrow><mn>10</mn></mrow><mrow><mrow><mo>−</mo><mn>15</mn></mrow></mrow></msup></math></stem>.</p>
    <note id="_852fee6c-91d1-4f0a-be10-a2263cb7c2e1"><name>NOTE  2</name>
    <p id="_193943b0-74b4-47a1-ac59-cb7d5b36999f">La valeur de l’incertitude-type est supposée correspondre à un niveau de confiance
    de 68 %. Toutefois, étant donné le nombre très limité de résultats disponibles, il se peut que,
    rétrospectivement, cela ne s’avère pas exact.</p>
    </note>
    </li>
    </ul>
    </li>
    </ul>

In BIPM xslt note tag is using for page sides notes (at right edge of page). 'Remarque' text should be encoded something else, or we need to determine a rules, when we put note in text, and when at page edge.

opoudjis commented 4 years ago

(8.1, 8.2) The ToC is clearly not an automatically generated ToC. IMO we should leave it as text, but the page number references will need to be replaced with cross-references in the source markup.

(8.3) The section numbers must be retained for HTML, because page numbers in crossreferences for the HTML are meaningless, and the HTML should cross-reference something. (It could do so without an overt anchor text, but that would involve too much fiddling with the source markup to be reasonable.) For the PDF, therefore, the brochure (and at this stage only the brochure) should indeed ignore subsection numbers in Annexes.

(9, 10) There won't be any rules. In my opinion we are going to have to do a mix of:

ronaldtse commented 4 years ago

How are these "block notes" encoded then?

image

As normal notes too?

ronaldtse commented 4 years ago

I have sought clarification from BIPM for 8.1/8.2 and 9/10.

ronaldtse commented 4 years ago

From BIPM:

(8.1, 8.2) The ToC is clearly not an automatically generated ToC. IMO we should leave it as text, but the page number references will need to be replaced with cross-references in the source markup.

Let's encode the ToC as normal text.

(9, 10) There won't be any rules. In my opinion we are going to have to do a mix of: How are these "block notes" encoded then?

These notes are "table notes" they apply to the table immediately above, and therefore are not side notes.

Ping @manuel489 to fix the source, and @opoudjis @Intelligent2013 .

Intelligent2013 commented 4 years ago

8.3. Fixed in xslt: изображение

Intelligent2013 commented 4 years ago

These notes are "table notes" they apply to the table immediately above, and therefore are not side notes.

In source PDF there is a case, when side notes relates to the table:

изображение

We should definite exactly what does mean 'table notes'. In my opinion In terms of XML it means:

...
</tr>
</tbody>
<note .... </note>
</table>

I put all <note>s as side notes (except preface section).

When we should put notes as side notes and when 'table notes'?

ronaldtse commented 4 years ago

I don’t remember exactly but there is a differentiation between a note and a table note. Perhaps @opoudjis can answer better.

manuelfuenmayor commented 4 years ago

These notes are "table notes" they apply to the table immediately above, and therefore are not side notes.

@ronaldtse , I believe the source already has the correct markup about the table notes, which is:

...

| bar | stem:["bar"] | stilb | stem:[sf "sb"]
| hour | stem:["h"] | |
|===

NOTE: The symbols whose unit names are preceded by dots are those which had already been adopted by a decision of the CIPM.

NOTE: The symbol for the stere, the unit of volume for firewood, shall be "st" and not "s", which had been previously assigned to it by the CIPM.

NOTE: To indicate a temperature interval or difference, rather than a temperature, the word "degree" in full, or the abbreviation "deg", must be used.

Maybe, the necessary changes that need to be done are in the yaml files.

ronaldtse commented 4 years ago

@manuel489 thanks. @opoudjis are the table notes typically encoded as normal notes?

opoudjis commented 4 years ago

Yes

opoudjis commented 4 years ago

Table notes are notes within a <table>, so they should be being rendered in XML as @Intelligent2013 expects them to be. I will need to debug why they aren't being so treated.

opoudjis commented 4 years ago

@ronaldtse:

This block note is not a table note nor a side note.

opoudjis commented 4 years ago

This discussion thread is becoming quickly unmanageable. I am moving discussion of notes (9, 10) to a new ticket: https://github.com/metanorma/metanorma-bipm/issues/15

opoudjis commented 4 years ago

I have also posted https://github.com/metanorma/metanorma-bipm/issues/16 separately.

opoudjis commented 4 years ago

(11) The treatment of cross references in the existing HTML and PDF is clearly manually generated and inconsistent, and we should not be seeking to replicate it:

https://www.bipm.org/en/CGPM/db/17/2/

HTML: See Recommendation 1 (CI-2002) of the CIPM on the revision of the practical realization of the definition of the metre. PDF: See Recommendation 1 (CI-2002) of the CIPM on the revision of the practical realization of the definition of the metre, p. 181.

https://www.bipm.org/en/CIPM/db/1984/1/

HTML: The CIPM, in 2002, decided to change the explanation of the quantity dose equivalent in the SI Brochure (Recommendation 2). PDF: * The CIPM, in 2002, decided to change the explanation of the quantity dose equivalent in the SI Brochure (Recommendation 2, see p. 182).

In the first instance, the page reference is appended to the end of the paragraph; in the second, it is inserted with a "see" within the cross-reference. The first instance cannot be generated sensibly from a single Asciidoctor cross-reference, and we should not seek to: markup should be adjusted so as to give sensible results in both PDF and HTML. The needed template for that is to put "see" before any cross-references, allow page numbers to trail after the cross-reference, and have uniform text for all cross-references.

opoudjis commented 4 years ago

(11) is now distinct ticket #17