metanorma / metanorma-bipm

Metanorma for BIPM documents
BSD 2-Clause "Simplified" License
2 stars 3 forks source link

Handling bilingual documents with weaving (sequential vs parallel) #101

Open ronaldtse opened 3 years ago

ronaldtse commented 3 years ago

The JCGM_200_2012 document uses a different layout than the bilingual SI Brochure which is sequential in "document".

The SI Brochure is composed of the English and French documents placed one after another. (document-level weaving).

JCGM 200 is composed of multiple weaving methods, where two documents are stitched together at different points, sequential in some but parallel in others.

There are different cases that need encoding in this layout.

Sequential

This section of English text corresponds to French text, show English first and French next.

English French
Screen Shot 2021-01-04 at 11 26 00 AM Screen Shot 2021-01-04 at 11 26 22 AM

Parallel

Side by side

English text corresponds to French text, placed side by side

Example 1 Example 2
Screen Shot 2021-01-04 at 11 27 19 AM Screen Shot 2021-01-04 at 11 28 36 AM

Shared element between both languages

Document element applies to both text, e.g. 1 table/image shared by both languages

Example 1 Example 2
Screen Shot 2021-01-04 at 11 27 53 AM Screen Shot 2021-01-04 at 11 29 34 AM

@intelligent2013 could you spell out the requirements for "weaving/stitching" the bilingual documents?

Originally posted by @ronaldtse in https://github.com/metanorma/metanorma-bipm/issues/88#issuecomment-753738504

Intelligent2013 commented 3 years ago

Some additional cases for further decision making:

It means that 'en' claim[1] can be displayed near 'fr' claim[1], ... 'en' claim[n] near 'fr' claim[n].

so need to check cell value when merge cell from different languages.

изображение

then for another language:

изображение

Intelligent2013 commented 3 years ago
Intelligent2013 commented 3 years ago

So there are 3 cases to display images:

  1. display own image for each language, each image in the column
  2. display common image for both languages
  3. display own image for each language, image for first language, then below image for second language.
Intelligent2013 commented 3 years ago

@ronaldtse The requirements for "weaving/stitching" the bilingual documents:

  1. table/image shared by both languages examples: изображение

изображение

should be marked with the attribute common="true" (in both documents) like this:

<table id="table1" common="true">...

and table in adoc for first document should have text for both languages (the logic for merging table cell via xslt is too complicated)

  1. table/image which should be displayed one after another

examples: изображение .... изображение

should be marked with the attribute span="true" (in both documents) like this:

<figure id="figure2" span="true">...

Some another comments:

Currenly, it shows like this - one footnotes below another: изображение

Intelligent2013 commented 3 years ago

@ronaldtse Alternative variant instead of common="true" and span="true" could be:

ronaldtse commented 3 years ago

left and right blocks align on start of claim, not paragraph, note, example

The easy way is to align on every (significant) document element. I wonder if this should be configurable (i.e. which elements should align across languages.

so there is a vertical misalignment between block (NOTE 3 in this case) at the end of claim:

This should be an anomaly as the original document was done manually.

there are cells, which contain only one value (m, kg, ....) and equal values (kelvin, mole)

I consider this table a shared table that is encoded as bilingual and inseparable into two separate per-language tables.

there are images with text on each language:

Let's consider these part of the language-specific image.

So there are 3 cases to display images:

  • display own image for each language, each image in the column
  • display common image for both languages
  • display own image for each language, image for first language, then below image for second language.

Correct.

Currenly, it shows like this - one footnotes below another:

This is acceptable for now.

Intelligent2013 commented 3 years ago

The easy way is to align on every (significant) document element. I wonder if this should be configurable (i.e. which elements should align across languages.

@ronaldtse would you like to configure it:

ronaldtse commented 3 years ago

Both:

ADOC: specifying a list of kind of elements which should align across languages, for example across_elements='clause note', i.e. all clauses, notes will be aligned across languages

This can be a configurable attribute in Adoc.

and

ADOC: specifying some mark/class for each concrete element that should be aligned across languages

Should work right?

opoudjis commented 3 years ago
  • We can have the 2 language pairs "match" some sort of anchor or ID, then we know they are parallel. (e.g. [en=my_id] will match [fr=my_id]
  • A shared element should have both the English and French IDs marked (e.g. [en=my_id,fr=my_id]) Should work right?

Apart from the minor detail that you've just made this markup up, Asciidoc will not do anything useful with it, and in any case the collections processing will destroy it because they will insert the document suffix after the id, precisely in order to guarantee identifier uniqueness in the aggregated document.

You cannot just insert [en=my_id,fr=my_id] and have it automatically work. Whatever ends up being put in will be extra work and novel markup. It is likely to be a novel, inter-document bookmark, which will not have an id but a name attribute, so that it can be exempted from the global and entirely correct requirement that id attributes must be globally unique within a collection.

Intelligent2013 commented 3 years ago

I agree that id should be unique in the collection xml.

Some thoughts how XSLT will process xml in this manner (preliminary solution, I have to do some experiments):

Intelligent2013 commented 3 years ago

JCGM XSLT updated to produce bilingual document with these rules/properties:

note: attributes names can be chaged, or changed to class. etc. It can be changed in prototype xslt.

There are a few restrictions:

So, if this solution is acceptable, then in adoc these property should be added:

opoudjis commented 3 years ago

Wait. These are updates to the information model of Metanorma, and I need to understand them before I can approve them, and make sure they are clear within the context of Metanorma as well. So:

If you're ok with these, I'll realise them in https://github.com/metanorma/metanorma-standoc/issues/420

opoudjis commented 3 years ago

https://tex.stackexchange.com/questions/308260/parallel-text-translation-including-the-same-double-parallel-heading-numbering as an FYI to myself...

Intelligent2013 commented 3 years ago
  • I still do not understand what align-cross-elements is doing,

Example1: align-cross-elements="clause", i.e. clauses (begin of clauses) always be aligned in bilingual text. изображение

Example2: align-cross-elements="clause li", i.e. clauses, list items (begin of clauses and list items) always be aligned in bilingual text. изображение

Example3: align-cross-elements="clause li p", i.e. clauses, list items and paragraphs (begin of clauses, list items and paragraphs) always be aligned in bilingual text. изображение

Note that in align-cross-elements we set the name of xml elements (<clause>, <p>, <li>) not @name attribute.

and why it belongs in the bibliography.

Actually align-cross-elements can be in any place, may be it would be better in metanorma-collection/align-cross-elements or metanorma-collection/align-cross-elements/manifest.

Is this a specification that, say, the elements "p, note, term" shall always be aligned in bilingual text? If it is, it does not belong in the semantic XML, and I'm not sure that it even belongs in Presentation XML; if I do inject it there, it will be in //bibdata/ext, and it will be as separate tags, not space-delimited.

No problem, the element's structure does not matter.

So presumably, a repeating //bibdata/ext/bilingual-align-element tag.

May be, is there a use case of this tag in real documents?

  • I'm not going to use @common and @span, which are much too open-ended in interpretation. They are mutually exclusive anyway, so instead of @common, @span, @cross-align, I suggest using @multilingual-rendering = common (or shared), @multilingual-rendering = full-width, @multilingual-rendering = cross-align respectively.

Agree.

But @name is distinct from @multilingual-rendering = name; the latter means "align with any element in the other document which has the same name attribute.

Sorry, I don't understand it. In my proposal @name means, for example:

I.e. element with @name="figtest" from 2nd document should be aligned near element @name="figtest" from 1st document.

But what @multilingual-rendering = name does mean in your proposal? I don't see a difference.

ronaldtse commented 3 years ago

Also a note that we will need validation on compilation for the semantic XML, i.e. so that the user won't miss alignments (e.g., "the element foo [en] does not have a corresponding element in [fr]").

opoudjis commented 3 years ago

@Intelligent2013 I'm proposing:

<figure id="figureC-2" name="figtest" multilingual-rendering = "name">
          <name>Figure C.2 — Stages of gelatinization</name>

<figure id="figureA-1" name="figtest" multilingual-rendering = "name">
              <name>Figure A.1test — Diviseur d’échantillon de type «Bon diviseur» (@common=true)</name>

It seems laboured, perhaps, but I don't want to presuppose what @name is used for in documents, and @multilingual-rendering makes it explicit that there is alignment, and is consistent with the other instances of @multilingual-rendering.

Intelligent2013 commented 3 years ago

It seems laboured, perhaps, but I don't want to presuppose what @name is used for in documents, and @multilingual-rendering makes it explicit that there is alignment, and is consistent with the other instances of @multilingual-rendering.

I see. Agree.

Regarding to these use cases (https://www.ctan.org/pkg/paracol):

https://tex.stackexchange.com/questions/308260/parallel-text-translation-including-the-same-double-parallel-heading-numbering as an FYI to myself...

may be it would be better to rename some properties:

for a clearer understanding of objectives the elements.

opoudjis commented 3 years ago
Intelligent2013 commented 3 years ago

@manuel489 and @anermina could you encode https://www.bipm.org/utils/common/documents/jcgm/JCGM_200_2012.pdf into asciidoc? Thank you!

ronaldtse commented 3 years ago

We're moving JCGM documents into https://github.com/metanorma/mn-samples-jcgm. Thanks!

ronaldtse commented 3 years ago

@manuel489 @anermina we will wait for JCGM to provide further information of JCGM 200, we don't want to manually convert that document, it's long. Thanks!