metanorma / bipm-si-brochure

SI Brochure edition 9, semantic encoded version (WARNING: DRAFT)
3 stars 0 forks source link

BIPM requested fixes 4: form of the references for the meetings needs a minor adjustment in the French bibliography #224

Open ronaldtse opened 9 months ago

ronaldtse commented 9 months ago

From Michael Stock:

The form of the references for the meetings needs a minor adjustment in the French bibliography (e.g. p111)

image001

Here the “e” should be superscript and the “r” of réunion lower-case (both times): CIPM 43e réunion (1950)… 43e réunion du CIPM and ‘Comité international des poids et mesures’ (capital letter for Comité only)

ronaldtse commented 9 months ago

I think this might have to do with the encoding at relaton-bipm (@andrew2net) and also relaton-render (@opoudjis).

opoudjis commented 9 months ago

We just ran into this issue with twitter-cldr-rb in https://github.com/metanorma/metanorma-iso/issues/1098: it internationalises the ordinal of 2 as "2e", and I had to insert the superscripts in post-processing. (I also refused to convert 2e to the more old-fashioned 2ème, because going back in time to use old-fashioned ordinals is not a useful activity for us to be indulging in.) Right now, I have only done this in metanorma-iso, not relaton-render.

This is a general issue for French ordinals, so @andrew2net, if you are using twitter-cldr-rb to generate French ordinals, remember to convert them with ret.sub(/(\d+)(\p{L}+)/, "\\1<sup>\\2</sup>"). (If you are not using twitter-cldr-rb to generate French ordinals, you really should.)

anermina commented 9 months ago

Just to add what is currently being fetched:

  <fetched>2024-02-22</fetched>
  <title format="text/plain" language="en" script="Latn">43rd meeting of the CIPM</title>
  <title format="text/plain" language="fr" script="Latn">43e réunion du CIPM</title>
  <uri type="citation" language="en" script="Latn">https://www.bipm.org/en/committees/ci/cipm/43-1950</uri>
  <uri type="citation" language="fr" script="Latn">https://www.bipm.org/fr/committees/ci/cipm/43-1950</uri>
  <uri type="pdf">https://www.bipm.org/documents/20126/71755187/CIPM1950.pdf/e2ae0cc0-7e98-0653-dfe2-4b8964ea73f1</uri>
  <uri type="src" language="en" script="Latn">https://raw.githubusercontent.com/metanorma/bipm-data-outcomes/main/cipm/meetings-en/meeting-43.yml</uri>
  <uri type="src" language="fr" script="Latn">https://raw.githubusercontent.com/metanorma/bipm-data-outcomes/main/cipm/meetings-fr/meeting-43.yml</uri>
  <docidentifier type="BIPM" primary="true" language="en" script="Latn">CIPM 43rd Meeting (1950)</docidentifier>
  <docidentifier type="BIPM" primary="true" language="fr" script="Latn">CIPM 43e Réunion (1950)</docidentifier>
  <docidentifier type="BIPM" primary="true">CIPM 43rd Meeting (1950) / CIPM 43e Réunion (1950)</docidentifier>
  <docnumber>CIPM 43rd Meeting (1950)</docnumber>
  <date type="published">
    <on>1950-06-13</on>
  </date>
  <contributor>
    <role type="publisher"/>
    <organization>
      <name language="en" script="Latn">International Bureau of Weights and Measures</name>
      <name language="fr" script="Latn">Bureau international des poids et mesures</name>
      <abbreviation>BIPM</abbreviation>
      <uri>www.bipm.org</uri>
    </organization>
  </contributor>
  <contributor>
    <role type="author"/>
    <organization>
      <name language="en" script="Latn">International Committee for Weights and Measures</name>
      <name language="fr" script="Latn">Comité International des Poids et Mesures</name>
      <abbreviation>CIPM</abbreviation>
    </organization>
  </contributor>
  <language>en</language>
  <language>fr</language>
  <script>Latn</script>
  <place>
    <city>Paris</city>
  </place>
  <ext schema-version="v1.0.0">
    <doctype>Meeting</doctype>
    <structuredidentifier>
      <docnumber>43</docnumber>
    </structuredidentifier>
  </ext>
</bibdata>
andrew2net commented 8 months ago

@ronaldtse do we need to adjust only French IDs? Should English IDs still have numbers without superscript ordinals and capitalized types?

andrew2net commented 8 months ago

image001

Here the “e” should be superscript and the “r” of réunion lower-case (both times): CIPM 43e réunion (1950)… 43e réunion du CIPM and ‘Comité international des poids et mesures’ (capital letter for Comité only)

@ronaldtse @anermina the second time is title. It comes from the bipm-data-outcomes dataset. Shouldn't we fix the titles in the dataset?

opoudjis commented 8 months ago

@ronaldtse do we need to adjust only French IDs? Should English IDs still have numbers without superscript ordinals and capitalized types?

no:

Superscript ordinals is mandatory practice in French. Superscript ordinals maybe used to be common in English, and Microsoft Word still likes them in autocorrect, but they are no longer mainstream.

Look at the 2008 edition of the Brochure. The front cover has a superscript ordinal for the French title, 8th edition. But the English text includes:

Screenshot 2024-03-06 at 15 32 02

So they are not doing it for English in their internally produced, older Brochure. We shouldn't either.

opoudjis commented 8 months ago

i don't have an answer on the capitalised types But I'd say don't bother unless they ask us to they don't refer in the 8th edition to "1st meeting of the the CGPM", just "1st CGPM", so I can't tell Even worse:

Screenshot 2024-03-06 at 15 37 41

They're not even translating the term, they're just saying 12th "General Conference on Weights and Measures" with the title in French so they're not putting Réunion "Meeting" in English so I can't tell So we have no guidance. In French though, they're clearly title case The thing is that English is increasingly abandoning Title Case style guides are starting to recommend against it So if you don't capitalise, it's not as big a deal as it was 50 years ago in English So I don't have an answer, I'd just say leave it alone in English

andrew2net commented 8 months ago

@ronaldtse Should we lower-case "r" in special cases like "Résolution de la CGPM (1889)" ?

andrew2net commented 8 months ago

@ronaldtse @opoudjis is this ok?

<bibdata type="proceedings" schema-version="v1.2.8">
  <title format="text/plain" language="en" script="Latn">43rd meeting of the CIPM</title>
  <title format="text/html" language="fr" script="Latn">43<sup>e</sup> réunion du CIPM</title>
  <uri type="citation" language="en" script="Latn">https://www.bipm.org/en/committees/ci/cipm/43-1950</uri>
  <uri type="citation" language="fr" script="Latn">https://www.bipm.org/fr/committees/ci/cipm/43-1950</uri>
  <uri type="pdf">https://www.bipm.org/documents/20126/71755187/CIPM1950.pdf/e2ae0cc0-7e98-0653-dfe2-4b8964ea73f1</uri>
  <uri type="src" language="en" script="Latn">https://raw.githubusercontent.com/metanorma/bipm-data-outcomes/main/cipm/meetings-en/meeting-43.yml</uri>
  <uri type="src" language="fr" script="Latn">https://raw.githubusercontent.com/metanorma/bipm-data-outcomes/main/cipm/meetings-fr/meeting-43.yml</uri>
  <docidentifier type="BIPM" primary="true" language="en" script="Latn">CIPM 43rd Meeting (1950)</docidentifier>
  <docidentifier type="BIPM" primary="true" language="fr" script="Latn">CIPM 43<sup>e</sup> réunion (1950)</docidentifier>
  <docidentifier type="BIPM" primary="true">CIPM 43rd Meeting (1950) / CIPM 43<sup>e</sup> réunion (1950)</docidentifier>
  <docnumber>CIPM 43rd Meeting (1950)</docnumber>
  <date type="published">
    <on>1950-06-13</on>
  </date>
  <contributor>
    <role type="publisher"/>
    <organization>
      <name language="en" script="Latn">International Bureau of Weights and Measures</name>
      <name language="fr" script="Latn">Bureau international des poids et mesures</name>
      <abbreviation language="en,fr" script="Latn">BIPM</abbreviation>
      <uri>www.bipm.org</uri>
    </organization>
  </contributor>
  <contributor>
    <role type="author"/>
    <organization>
      <name language="en" script="Latn">International Committee for Weights and Measures</name>
      <name language="fr" script="Latn">Comité International des Poids et Mesures</name>
      <abbreviation language="en,fr" script="Latn">CIPM</abbreviation>
    </organization>
  </contributor>
  <language>en</language>
  <language>fr</language>
  <script>Latn</script>
  <place>
    <city>Paris</city>
  </place>
  <ext schema-version="v1.0.0">
    <doctype>Meeting</doctype>
    <structuredidentifier>
      <docnumber>43</docnumber>
    </structuredidentifier>
  </ext>
</bibdata>
ronaldtse commented 8 months ago

So we are actually stepping into unchartered (undefined) territory, because the "title" element's content model is currently undefined for rich text.

Are we doing to do that now in the Relaton data model, to define the text model for textual content?

Ping @opoudjis .

opoudjis commented 8 months ago

Officially, we're agnostic, and allow text models to be made explicit in places like titles, which allow xs:any

FormattedString =
  # attribute format { ( "plain" | "html" | "docbook" | "tei" | "asciidoc" | "markdown" ) }?,
  attribute format { ( "text/plain" | "text/html" | "application/docbook+xml" |
    "application/tei+xml" | "text/x-asciidoc" | "text/markdown" | "application/x-metanorma+xml" | text ) }?,
  LocalizedStringOrXsAny

LocalizedStringOrXsAny1 =
  # multiple languages and scripts possible: comma delimit them if so
  attribute language { text }?,
  attribute locale { text }?,
  attribute script { text }?,
  ( text | AnyElement )+

LocalizedStringOrXsAny =
  LocalizedStringOrXsAny1 |
  element variant { LocalizedStringOrXsAny1 }+

That's what's in the grammar, and what we were thinking 5 years ago.

De facto, we do have a text model for textual content already, and we've been using it. Unsurprisingly, it's Metanorma itself, or rather, the core of it in Basicdoc. So with IETF abstracts, we use <p> not IETF's native <t>; we replace Latex formatting in Bibtex-derived titles with Basicdoc. Basicdoc of course is pretty much HTML at the inline markup level, so it's a safe default.

Suggest we make this official, and make relaton text like titles be either text, or Basicdoc XML.

andrew2net commented 8 months ago

@ronaldtse @opoudjis and, as you can see, we have to use rich format in IDs now

<docidentifier type="BIPM" primary="true" language="fr" script="Latn">CIPM 43<sup>e</sup> réunion (1950)</docidentifier>
ronaldtse commented 8 months ago

@andrew2net yes exactly and I want to know that the <sup> element is part of the XML schema for BasicDoc XML.

andrew2net commented 8 months ago

@ronaldtse I think we need to update the XML schema to allow markup in IDs.

opoudjis commented 8 months ago

Ah. Right, missed that. Ugh, yeah. Ids currently are strictly text