Open ronaldtse opened 9 months ago
I think this might have to do with the encoding at relaton-bipm (@andrew2net) and also relaton-render (@opoudjis).
We just ran into this issue with twitter-cldr-rb in https://github.com/metanorma/metanorma-iso/issues/1098: it internationalises the ordinal of 2 as "2e", and I had to insert the superscripts in post-processing. (I also refused to convert 2e to the more old-fashioned 2ème, because going back in time to use old-fashioned ordinals is not a useful activity for us to be indulging in.) Right now, I have only done this in metanorma-iso, not relaton-render.
This is a general issue for French ordinals, so @andrew2net, if you are using twitter-cldr-rb to generate French ordinals, remember to convert them with ret.sub(/(\d+)(\p{L}+)/, "\\1<sup>\\2</sup>")
. (If you are not using twitter-cldr-rb to generate French ordinals, you really should.)
Just to add what is currently being fetched:
<fetched>2024-02-22</fetched>
<title format="text/plain" language="en" script="Latn">43rd meeting of the CIPM</title>
<title format="text/plain" language="fr" script="Latn">43e réunion du CIPM</title>
<uri type="citation" language="en" script="Latn">https://www.bipm.org/en/committees/ci/cipm/43-1950</uri>
<uri type="citation" language="fr" script="Latn">https://www.bipm.org/fr/committees/ci/cipm/43-1950</uri>
<uri type="pdf">https://www.bipm.org/documents/20126/71755187/CIPM1950.pdf/e2ae0cc0-7e98-0653-dfe2-4b8964ea73f1</uri>
<uri type="src" language="en" script="Latn">https://raw.githubusercontent.com/metanorma/bipm-data-outcomes/main/cipm/meetings-en/meeting-43.yml</uri>
<uri type="src" language="fr" script="Latn">https://raw.githubusercontent.com/metanorma/bipm-data-outcomes/main/cipm/meetings-fr/meeting-43.yml</uri>
<docidentifier type="BIPM" primary="true" language="en" script="Latn">CIPM 43rd Meeting (1950)</docidentifier>
<docidentifier type="BIPM" primary="true" language="fr" script="Latn">CIPM 43e Réunion (1950)</docidentifier>
<docidentifier type="BIPM" primary="true">CIPM 43rd Meeting (1950) / CIPM 43e Réunion (1950)</docidentifier>
<docnumber>CIPM 43rd Meeting (1950)</docnumber>
<date type="published">
<on>1950-06-13</on>
</date>
<contributor>
<role type="publisher"/>
<organization>
<name language="en" script="Latn">International Bureau of Weights and Measures</name>
<name language="fr" script="Latn">Bureau international des poids et mesures</name>
<abbreviation>BIPM</abbreviation>
<uri>www.bipm.org</uri>
</organization>
</contributor>
<contributor>
<role type="author"/>
<organization>
<name language="en" script="Latn">International Committee for Weights and Measures</name>
<name language="fr" script="Latn">Comité International des Poids et Mesures</name>
<abbreviation>CIPM</abbreviation>
</organization>
</contributor>
<language>en</language>
<language>fr</language>
<script>Latn</script>
<place>
<city>Paris</city>
</place>
<ext schema-version="v1.0.0">
<doctype>Meeting</doctype>
<structuredidentifier>
<docnumber>43</docnumber>
</structuredidentifier>
</ext>
</bibdata>
@ronaldtse do we need to adjust only French IDs? Should English IDs still have numbers without superscript ordinals and capitalized types?
Here the “e” should be superscript and the “r” of réunion lower-case (both times): CIPM 43e réunion (1950)… 43e réunion du CIPM and ‘Comité international des poids et mesures’ (capital letter for Comité only)
@ronaldtse @anermina the second time is title. It comes from the bipm-data-outcomes dataset. Shouldn't we fix the titles in the dataset?
@ronaldtse do we need to adjust only French IDs? Should English IDs still have numbers without superscript ordinals and capitalized types?
no:
Superscript ordinals is mandatory practice in French. Superscript ordinals maybe used to be common in English, and Microsoft Word still likes them in autocorrect, but they are no longer mainstream.
Look at the 2008 edition of the Brochure. The front cover has a superscript ordinal for the French title, 8th edition. But the English text includes:
So they are not doing it for English in their internally produced, older Brochure. We shouldn't either.
i don't have an answer on the capitalised types But I'd say don't bother unless they ask us to they don't refer in the 8th edition to "1st meeting of the the CGPM", just "1st CGPM", so I can't tell Even worse:
They're not even translating the term, they're just saying 12th "General Conference on Weights and Measures" with the title in French so they're not putting Réunion "Meeting" in English so I can't tell So we have no guidance. In French though, they're clearly title case The thing is that English is increasingly abandoning Title Case style guides are starting to recommend against it So if you don't capitalise, it's not as big a deal as it was 50 years ago in English So I don't have an answer, I'd just say leave it alone in English
@ronaldtse Should we lower-case "r" in special cases like "Résolution de la CGPM (1889)" ?
@ronaldtse @opoudjis is this ok?
<bibdata type="proceedings" schema-version="v1.2.8">
<title format="text/plain" language="en" script="Latn">43rd meeting of the CIPM</title>
<title format="text/html" language="fr" script="Latn">43<sup>e</sup> réunion du CIPM</title>
<uri type="citation" language="en" script="Latn">https://www.bipm.org/en/committees/ci/cipm/43-1950</uri>
<uri type="citation" language="fr" script="Latn">https://www.bipm.org/fr/committees/ci/cipm/43-1950</uri>
<uri type="pdf">https://www.bipm.org/documents/20126/71755187/CIPM1950.pdf/e2ae0cc0-7e98-0653-dfe2-4b8964ea73f1</uri>
<uri type="src" language="en" script="Latn">https://raw.githubusercontent.com/metanorma/bipm-data-outcomes/main/cipm/meetings-en/meeting-43.yml</uri>
<uri type="src" language="fr" script="Latn">https://raw.githubusercontent.com/metanorma/bipm-data-outcomes/main/cipm/meetings-fr/meeting-43.yml</uri>
<docidentifier type="BIPM" primary="true" language="en" script="Latn">CIPM 43rd Meeting (1950)</docidentifier>
<docidentifier type="BIPM" primary="true" language="fr" script="Latn">CIPM 43<sup>e</sup> réunion (1950)</docidentifier>
<docidentifier type="BIPM" primary="true">CIPM 43rd Meeting (1950) / CIPM 43<sup>e</sup> réunion (1950)</docidentifier>
<docnumber>CIPM 43rd Meeting (1950)</docnumber>
<date type="published">
<on>1950-06-13</on>
</date>
<contributor>
<role type="publisher"/>
<organization>
<name language="en" script="Latn">International Bureau of Weights and Measures</name>
<name language="fr" script="Latn">Bureau international des poids et mesures</name>
<abbreviation language="en,fr" script="Latn">BIPM</abbreviation>
<uri>www.bipm.org</uri>
</organization>
</contributor>
<contributor>
<role type="author"/>
<organization>
<name language="en" script="Latn">International Committee for Weights and Measures</name>
<name language="fr" script="Latn">Comité International des Poids et Mesures</name>
<abbreviation language="en,fr" script="Latn">CIPM</abbreviation>
</organization>
</contributor>
<language>en</language>
<language>fr</language>
<script>Latn</script>
<place>
<city>Paris</city>
</place>
<ext schema-version="v1.0.0">
<doctype>Meeting</doctype>
<structuredidentifier>
<docnumber>43</docnumber>
</structuredidentifier>
</ext>
</bibdata>
So we are actually stepping into unchartered (undefined) territory, because the "title" element's content model is currently undefined for rich text.
Are we doing to do that now in the Relaton data model, to define the text model for textual content?
Ping @opoudjis .
Officially, we're agnostic, and allow text models to be made explicit in places like titles, which allow xs:any
FormattedString =
# attribute format { ( "plain" | "html" | "docbook" | "tei" | "asciidoc" | "markdown" ) }?,
attribute format { ( "text/plain" | "text/html" | "application/docbook+xml" |
"application/tei+xml" | "text/x-asciidoc" | "text/markdown" | "application/x-metanorma+xml" | text ) }?,
LocalizedStringOrXsAny
LocalizedStringOrXsAny1 =
# multiple languages and scripts possible: comma delimit them if so
attribute language { text }?,
attribute locale { text }?,
attribute script { text }?,
( text | AnyElement )+
LocalizedStringOrXsAny =
LocalizedStringOrXsAny1 |
element variant { LocalizedStringOrXsAny1 }+
That's what's in the grammar, and what we were thinking 5 years ago.
De facto, we do have a text model for textual content already, and we've been using it. Unsurprisingly, it's Metanorma itself, or rather, the core of it in Basicdoc. So with IETF abstracts, we use <p>
not IETF's native <t>
; we replace Latex formatting in Bibtex-derived titles with Basicdoc. Basicdoc of course is pretty much HTML at the inline markup level, so it's a safe default.
Suggest we make this official, and make relaton text like titles be either text, or Basicdoc XML.
@ronaldtse @opoudjis and, as you can see, we have to use rich format in IDs now
<docidentifier type="BIPM" primary="true" language="fr" script="Latn">CIPM 43<sup>e</sup> réunion (1950)</docidentifier>
@andrew2net yes exactly and I want to know that the <sup>
element is part of the XML schema for BasicDoc XML.
@ronaldtse I think we need to update the XML schema to allow markup in IDs.
Ah. Right, missed that. Ugh, yeah. Ids currently are strictly text
From Michael Stock:
The form of the references for the meetings needs a minor adjustment in the French bibliography (e.g. p111)
Here the “e” should be superscript and the “r” of réunion lower-case (both times): CIPM 43e réunion (1950)… 43e réunion du CIPM and ‘Comité international des poids et mesures’ (capital letter for Comité only)