Handling bilingual documents with weaving (sequential vs parallel)

ronaldtse commented 3 years ago

The JCGM_200_2012 document uses a different layout than the bilingual SI Brochure which is sequential in "document".

The SI Brochure is composed of the English and French documents placed one after another. (document-level weaving).

JCGM 200 is composed of multiple weaving methods, where two documents are stitched together at different points, sequential in some but parallel in others.

There are different cases that need encoding in this layout.

Sequential

This section of English text corresponds to French text, show English first and French next.

English	French

Parallel

Side by side

English text corresponds to French text, placed side by side

Example 1	Example 2

Shared element between both languages

Document element applies to both text, e.g. 1 table/image shared by both languages

Example 1	Example 2

@intelligent2013 could you spell out the requirements for "weaving/stitching" the bilingual documents?

Originally posted by @ronaldtse in https://github.com/metanorma/metanorma-bipm/issues/88#issuecomment-753738504

Intelligent2013 commented 3 years ago

Some additional cases for further decision making:

left and right blocks align on start of claim, not paragraph, note, example so there is a vertical misalignment between block (NOTE 3 in this case) at the end of claim:

It means that 'en' claim[1] can be displayed near 'fr' claim[1], ... 'en' claim[n] near 'fr' claim[n].

there are cells, which contain only one value (m, kg, ....) and equal values (kelvin, mole)

so need to check cell value when merge cell from different languages.

there are images with text on each language:

and common image (without text) for both languages

Figure with title displays as independent image:

then for another language:

need to check how to do two columns footnotes in Apache FOP:

Intelligent2013 commented 3 years ago

left and right blocks align on
- image,
- table,
- bibitem

Intelligent2013 commented 3 years ago

So there are 3 cases to display images:

display own image for each language, each image in the column
display common image for both languages
display own image for each language, image for first language, then below image for second language.

Intelligent2013 commented 3 years ago

@ronaldtse The requirements for "weaving/stitching" the bilingual documents:

table/image shared by both languages examples:

should be marked with the attribute common="true" (in both documents) like this:

<table id="table1" common="true">...

and table in adoc for first document should have text for both languages (the logic for merging table cell via xslt is too complicated)

table/image which should be displayed one after another

examples: ....

should be marked with the attribute span="true" (in both documents) like this:

<figure id="figure2" span="true">...

Some another comments:

at this moment I can't figure out how to do two-column footnotes like this:

Currenly, it shows like this - one footnotes below another:

here is a draft resulted JCGM PDF example (generated from manually encoded xml from en- end fr- ISO Rice xmls in metanorma-collection structure), just for demontration current result: document.col.presentation.pdf

Intelligent2013 commented 3 years ago

@ronaldtse Alternative variant instead of common="true" and span="true" could be:

class="common"
class="span"

ronaldtse commented 3 years ago

left and right blocks align on start of claim, not paragraph, note, example

The easy way is to align on every (significant) document element. I wonder if this should be configurable (i.e. which elements should align across languages.

so there is a vertical misalignment between block (NOTE 3 in this case) at the end of claim:

This should be an anomaly as the original document was done manually.

there are cells, which contain only one value (m, kg, ....) and equal values (kelvin, mole)

I consider this table a shared table that is encoded as bilingual and inseparable into two separate per-language tables.

there are images with text on each language:

Let's consider these part of the language-specific image.

So there are 3 cases to display images:

display own image for each language, each image in the column

display common image for both languages

display own image for each language, image for first language, then below image for second language.

Correct.

Currenly, it shows like this - one footnotes below another:

This is acceptable for now.

Intelligent2013 commented 3 years ago

The easy way is to align on every (significant) document element. I wonder if this should be configurable (i.e. which elements should align across languages.

@ronaldtse would you like to configure it:

via xslt by specifying a list of kind of elements which should align across languages, for example across_elements='clause note', i.e. all clauses, notes will be aligned across languages or
via .adoc markup by specifying some mark/class for each concrete element that should be aligned across languages?

ronaldtse commented 3 years ago

Both:

ADOC: specifying a list of kind of elements which should align across languages, for example across_elements='clause note', i.e. all clauses, notes will be aligned across languages

This can be a configurable attribute in Adoc.

and

ADOC: specifying some mark/class for each concrete element that should be aligned across languages

We can have the 2 language pairs "match" some sort of anchor or ID, then we know they are parallel. (e.g. [en=my_id] will match [fr=my_id]
A shared element should have both the English and French IDs marked (e.g. [en=my_id,fr=my_id])

Should work right?

opoudjis commented 3 years ago

We can have the 2 language pairs "match" some sort of anchor or ID, then we know they are parallel. (e.g. [en=my_id] will match [fr=my_id]

A shared element should have both the English and French IDs marked (e.g. [en=my_id,fr=my_id]) Should work right?

Apart from the minor detail that you've just made this markup up, Asciidoc will not do anything useful with it, and in any case the collections processing will destroy it because they will insert the document suffix after the id, precisely in order to guarantee identifier uniqueness in the aggregated document.

You cannot just insert [en=my_id,fr=my_id] and have it automatically work. Whatever ends up being put in will be extra work and novel markup. It is likely to be a novel, inter-document bookmark, which will not have an id but a name attribute, so that it can be exempted from the global and entirely correct requirement that id attributes must be globally unique within a collection.

Intelligent2013 commented 3 years ago

I agree that id should be unique in the collection xml.

Some thoughts how XSLT will process xml in this manner (preliminary solution, I have to do some experiments):

1st document is lead
2nd document is slave
xslt process each element that should be aligned from 1st document and process match element from 2nd document.
if 1st document has an additional element (there isn't in 2nd), then it show as is, right column is empty.
if 2nd document has an additional element (there isn't in 1st), then it show as is, left column is empty.
To match element between two languages we can have a few methods:
- match by element number, (examples: 1st clause from en doc match to 1st clause from fr doc. 1st note in 2nd clause from en doc match to 1st note in 2nd clause)
- if documents are mismatched in elements (which should be aligned), then user have to add by some name/bookmark attribute in adoc. Examples: one document has an additional clause, or note. It means that next one (common element in both documents) should be marked with the additional unique attribute name/bookmark.
- if both documents have matched notes, but don't want to align them for whole document, just only for some notes, then we have to add attribute for element - it may be unique name/attribute and just attribute cross-align (for example).

Intelligent2013 commented 3 years ago

JCGM XSLT updated to produce bilingual document with these rules/properties:

table/figure attributes common='true' and span='true' - see above
to align of concrete kind of elements (note, term, p) across languages - specify it in property (xslt variable) align-cross-elements
to align of the concrete element (same place in both documents hierachy) across languages - specify attribute cross-align="true" for element in both documents
to align of the concrete element (may be different place in document hierarchy) across languages - specify attribute name="unique name for document(not documents) in elements for both documents

note: attributes names can be chaged, or changed to class. etc. It can be changed in prototype xslt.

There are a few restrictions:

alignment for list item works only for first paragraph in list item
alignment for tables works only for whole table (alignment on table's title)
if second document has some additional element that determined to show as across languages (i.e. element's name there is in align-cross-elements propery, or marked with attribute @cross-align), then it can't be showed. It is very complicated logic for xslt to find such 'unknown/non-linked' element and I can't figure out how to do it.

So, if this solution is acceptable, then in adoc these property should be added:

in bibliography - property align-cross-elements (specify xml block element's names delimeted by space)
in document body:
- properties common='true' and span='true' for table and figure
- property cross-align="true"
- property name (unique name in document)

opoudjis commented 3 years ago

Wait. These are updates to the information model of Metanorma, and I need to understand them before I can approve them, and make sure they are clear within the context of Metanorma as well. So:

I still do not understand what align-cross-elements is doing, and why it belongs in the bibliography. Is this a specification that, say, the elements "p, note, term" shall always be aligned in bilingual text? If it is, it does not belong in the semantic XML, and I'm not sure that it even belongs in Presentation XML; if I do inject it there, it will be in //bibdata/ext, and it will be as separate tags, not space-delimited. So presumably, a repeating //bibdata/ext/bilingual-align-element tag.
I'm not going to use @common and @span, which are much too open-ended in interpretation. They are mutually exclusive anyway, so instead of @common, @span, @cross-align, I suggest using @multilingual-rendering = common (or shared), @multilingual-rendering = full-width, @multilingual-rendering = cross-align respectively.
I'm not enthusiastic about @name, but I can't come up with anything better. But @name is distinct from @multilingual-rendering = name; the latter means "align with any element in the other document which has the same name attribute.

If you're ok with these, I'll realise them in https://github.com/metanorma/metanorma-standoc/issues/420

opoudjis commented 3 years ago

https://tex.stackexchange.com/questions/308260/parallel-text-translation-including-the-same-double-parallel-heading-numbering as an FYI to myself...

Intelligent2013 commented 3 years ago

I still do not understand what align-cross-elements is doing,

Example1: align-cross-elements="clause", i.e. clauses (begin of clauses) always be aligned in bilingual text.

Example2: align-cross-elements="clause li", i.e. clauses, list items (begin of clauses and list items) always be aligned in bilingual text.

Example3: align-cross-elements="clause li p", i.e. clauses, list items and paragraphs (begin of clauses, list items and paragraphs) always be aligned in bilingual text.

Note that in align-cross-elements we set the name of xml elements (<clause>, <p>, <li>) not @name attribute.

and why it belongs in the bibliography.

Actually align-cross-elements can be in any place, may be it would be better in metanorma-collection/align-cross-elements or metanorma-collection/align-cross-elements/manifest.

Is this a specification that, say, the elements "p, note, term" shall always be aligned in bilingual text? If it is, it does not belong in the semantic XML, and I'm not sure that it even belongs in Presentation XML; if I do inject it there, it will be in //bibdata/ext, and it will be as separate tags, not space-delimited.

No problem, the element's structure does not matter.

So presumably, a repeating //bibdata/ext/bilingual-align-element tag.

May be, is there a use case of this tag in real documents?

I'm not going to use @common and @span, which are much too open-ended in interpretation. They are mutually exclusive anyway, so instead of @common, @span, @cross-align, I suggest using @multilingual-rendering = common (or shared), @multilingual-rendering = full-width, @multilingual-rendering = cross-align respectively.

Agree.

But @name is distinct from @multilingual-rendering = name; the latter means "align with any element in the other document which has the same name attribute.

Sorry, I don't understand it. In my proposal @name means, for example:

in first document we set

<figure id="figureC-2" name="figtest">
      <name>Figure C.2 — Stages of gelatinization</name>

in second document we set

<figure id="figureA-1" name="figtest">
          <name>Figure A.1test — Diviseur d’échantillon de type «Bon diviseur» (@common=true)</name>

in resulted PDF:

I.e. element with @name="figtest" from 2nd document should be aligned near element @name="figtest" from 1st document.

But what @multilingual-rendering = name does mean in your proposal? I don't see a difference.

ronaldtse commented 3 years ago

Also a note that we will need validation on compilation for the semantic XML, i.e. so that the user won't miss alignments (e.g., "the element foo [en] does not have a corresponding element in [fr]").

opoudjis commented 3 years ago

@Intelligent2013 I'm proposing:

<figure id="figureC-2" name="figtest" multilingual-rendering = "name">
          <name>Figure C.2 — Stages of gelatinization</name>

<figure id="figureA-1" name="figtest" multilingual-rendering = "name">
              <name>Figure A.1test — Diviseur d’échantillon de type «Bon diviseur» (@common=true)</name>

It seems laboured, perhaps, but I don't want to presuppose what @name is used for in documents, and @multilingual-rendering makes it explicit that there is alignment, and is consistent with the other instances of @multilingual-rendering.

Intelligent2013 commented 3 years ago

It seems laboured, perhaps, but I don't want to presuppose what @name is used for in documents, and @multilingual-rendering makes it explicit that there is alignment, and is consistent with the other instances of @multilingual-rendering.

I see. Agree.

Regarding to these use cases (https://www.ctan.org/pkg/paracol):

https://tex.stackexchange.com/questions/308260/parallel-text-translation-including-the-same-double-parallel-heading-numbering as an FYI to myself...

may be it would be better to rename some properties:

align-cross-elements -> parallel-elements, or use bilingual-align-element if it exist already.
@multilingual-rendering = full-width -> @multilingual-rendering = double-column
@multilingual-rendering = cross-align -> @multilingual-rendering = parallel

for a clearer understanding of objectives the elements.

opoudjis commented 3 years ago

align-cross-elements -> //bibdata/ext/parallel-align-element
@multilingual-rendering = full-width -> @multilingual-rendering = all-columns
@multilingual-rendering = cross-align -> @multilingual-rendering = parallel

Intelligent2013 commented 3 years ago

@manuel489 and @anermina could you encode https://www.bipm.org/utils/common/documents/jcgm/JCGM_200_2012.pdf into asciidoc? Thank you!

ronaldtse commented 3 years ago

We're moving JCGM documents into https://github.com/metanorma/mn-samples-jcgm. Thanks!

ronaldtse commented 3 years ago

@manuel489 @anermina we will wait for JCGM to provide further information of JCGM 200, we don't want to manually convert that document, it's long. Thanks!

metanorma / metanorma-bipm