what is ruby (was emphasis markers)

duncdrum commented 3 years ago

see Images/26478854.jpg

Emphasis markers when interspersed with ruby are another special case. Special because of the markup headaches with frequent problems of overlap, not because this is a rare occurrence. Technically the markers belong to the rb text, but they appear as part of the ruby stream/line

Thee Guidelines should feature one of these complex examples to clarify how we suggest encoders deal with this:

Should emphasis markers be encoded as ruby, use css via @style/@rend, … what about texts with both emphasis markers and ruby?
Demonstrate that <gaiji> can appear both in <rb> and <rt>

see #2

Not too happy with that example, comments and suggestions welcome.

knagasaki commented 3 years ago

Import from main repo

see Images/26478854.jpg

Please write the absolute URL.

duncdrum commented 3 years ago

https://github.com/martindholmes/rubyForTEI/blob/main/Images/26478854.jpg

they're all in the images folder in this repo

knagasaki commented 3 years ago

https://github.com/martindholmes/rubyForTEI/blob/main/Images/26478854.jpg

Thank you very much!

they're all in the images folder in this repo

I hope each URL would be indicated as full path.

knagasaki commented 3 years ago

Import from main repo

see Images/26478854.jpg

Emphasis markers when interspersed with ruby are another special case. Special because of the markup headaches with frequent problems of overlap, not because this is a rare occurrence. Technically the markers belong to the rb text, but they appear as part of the ruby stream/line

Thee Guidelines should feature one of these complex examples to clarify how we suggest encoders deal with this:

Should emphasis markers be encoded as ruby, use css via @style/@rend, … what about texts with both emphasis markers and ruby?

Ruby in HTML5 might be able to be utilized in various cases of appended small characters and marks as a layout function. However, semantically, ruby is different as we wrote in the original proposal. So, the emphasis marks should not be encoded by <ruby> but by <emph> or any appropriate element in TEI, but might be done by <ruby> in HTML5.

Demonstrate that <gaiji> can appear both in <rb> and <rt>

<gaiji> must appear in both <rb> and <rt> Should I give one or more examples of the markup?

see #2

Not too happy with that example, comments and suggestions welcome.

duncdrum commented 3 years ago

I already picked an example that includes <g> with this pull request, so i think that should be enough for the guidelines. I have thought about <emph> as an alternative way, but i don't see how that makes overlap any less likely to occur. Do you then agree that an example showing interspersed emphasis markers should be included in the guidelines?

https://github.com/martindholmes/rubyForTEI/blob/main/ruby_samples.xml

martindholmes commented 3 years ago

Before we make a decision on what should be included, we need to answer the basic questions about how we define/identify ruby (as opposed to other annotation types), and what the intention of this current sprint is. I don't think we have any chance of getting complex and exhaustive documentation into the Guidelines for this release. I would really like to get the basic use-cases covered and released before we go on to the complexities, but I think that's for Council to decide.

knagasaki commented 3 years ago

I already picked an example that includes <g> with this pull request, so i think that should be enough for the guidelines. I have thought about <emph> as an alternative way, but i don't see how that makes overlap any less likely to occur. Do you then agree that an example showing interspersed emphasis markers should be included in the guidelines? https://github.com/martindholmes/rubyForTEI/blob/main/ruby_samples.xml

Thank you for consideration of the example.

Aside from inclusion of the example into the Guidelines, I wonder whether the IDS should be included directly in <g>. Anyway, the character can be represented as a character in CJK Unified ideograph with IVS like U+8A55 with U+E0101. If you would like to use <g>, you would use a <char> including the codepoints and refer to it. But it is too complex and too long to be included in the Guidelines for explanation of ruby, I think. And then, the relationship between the <rt>s and <rb>s in the example seems not to be pronunciation nor parallel text. Then, it should be encoded by note with appropriate attributes, not by <ruby> in TEI (but may be encoded in HTML5 for rendering). If you want to see a complex example, I know other materials of Japanese classics.

martindholmes commented 3 years ago

I like the CSS approach to encoding emphasis; that aligns with our current recommendations on writing modes too.

knagasaki commented 3 years ago

Before we make a decision on what should be included, we need to answer the basic questions about how we define/identify ruby (as opposed to other annotation types), and what the intention of this current sprint is. I don't think we have any chance of getting complex and exhaustive documentation into the Guidelines for this release. I would really like to get the basic use-cases covered and released before we go on to the complexities, but I think that's for Council to decide.

I think the examples which are described in the original proposal are enough for basic usage of tons of Japanese modern books, comics, subtitles of movies and other types of modern Japanese texts. While consideration of complex examples is fun and important for some users, I would like to brush up the proposed ones to put into the Guidelines, if possible. I would be glad if you would list what should we do for it.

knagasaki commented 3 years ago

Import from main repo

see Images/26478854.jpg

Emphasis markers when interspersed with ruby are another special case. Special because of the markup headaches with frequent problems of overlap, not because this is a rare occurrence. Technically the markers belong to the rb text, but they appear as part of the ruby stream/line

Thee Guidelines should feature one of these complex examples to clarify how we suggest encoders deal with this:

Should emphasis markers be encoded as ruby, use css via @style/@rend, … what about texts with both emphasis markers and ruby?

Demonstrate that <gaiji> can appear both in <rb> and <rt>

see #2

Not too happy with that example, comments and suggestions welcome.

I would suggest that we will form a specialized guidelines outside of the TEI P5 guidelines for the complex usages like some other TEI-related communities. It was an advice by some experienced TEIers. According to it, we are now voluntarily trying to make Japanese guidelines for East Asian materials so far: https://github.com/TEI-EAJ/jp_guidelines/wiki . As we don't have any budget for it, it is very slow. But we would like to extend it to other languages such as Chinese, Korean, Vietnamese, English, and so on, if you and others would collaborate with us. Are you interested in that?

sydb commented 3 years ago

For the East Asian ignorant among us, could someone explain what an “emphasis marker” is?

747 commented 3 years ago

@sydb Here what mentioned as "emphasis markers" are those highlighted punctuation marks in the given image.

スクリーンショット 2021-01-27 124533

The English Wikipedia also has a brief introduction about it: https://en.wikipedia.org/wiki/Emphasis_point

It is, in form, each separate mark that is typically put on the center-right side of a character, but conceptually, highlighting of important spans of text, only embodied discretely. In this sense it functionally resembles Sperrsatz or such practices.

Though emphasis markers are sometimes put by the author, they are traditionally more often done by the reader (for their own sake) or the reviewer before publishing. There is no standardized shape of emphasis markers, so they often take different appearances in the same text, and look same with other punctuations, even ambiguous on the layout (which is the case seen in the image; a clause delimiter interrupts a continuous emphasis span).

duncdrum commented 3 years ago

According to it, we are now voluntarily trying to make Japanese guidelines for East Asian materials so far: https://github.com/TEI-EAJ/jp_guidelines/wiki . As we don't have any budget for it, it is very slow. But we would like to extend it to other languages such as Chinese, Korean, Vietnamese, English, and so on, if you and others would collaborate with us. Are you interested in that?

@knagasaki I have offered my help repeatedly, that is still the case. I thought, however, that this is what we are doing here already? @martindholmes the current sprint started November 02/2020, these are all still my original issues, as for budget and timetable s.a., I was also unaware of the release schedule timing.

Then, it should be encoded by note with appropriate attributes, not by in TEI (but may be encoded in HTML5 for rendering). If you want to see a complex example, I know other materials of Japanese classics.

This is crucial, if ruby in TEI are to be used only in some places and not in others where they would be expected to be used in html5 or by encoders of other East Asian documents, then that needs to be documented in the Guidelines with an example and reason. If you have a Japanese example, of where html5 would use ruby but TEI should use note that would be great, alternatively the example I provided should do as well. I happen to disagree that the example above should not be encoded via ruby.

Anyway, the character can be represented as a character in CJK Unified ideograph with IVS like U+8A55 with U+E0101.

Yes for demonstration purposes I picked a sequence that contains such a character, the use of IDS inside <g> is mentioned in the Guidelines (I wrote that section) so for demo purposes it should suffice. We can of course add a cross-reference to the section in the prose.

@sydb there is a link to the CSS definitions in the xml, but @747 has covered most of it thank you.

There is no standardized shape of emphasis markers, so they often take different appearances in the same text, and look same with other punctuations,

I wasn't going to point out the mix-n-match of regular punctuation with emphasis markers quite yet, but this is why documents containing both ruby and emphasis markers are important to cover imv, they are by no means a rare occurrence. I would contest the non-standardization of forms, these are very limited and per document consistent. But there are no universal rules governing the use of emphasis markers for East Asian documents across languages and periods .

knagasaki commented 3 years ago

According to it, we are now voluntarily trying to make Japanese guidelines for East Asian materials so far: https://github.com/TEI-EAJ/jp_guidelines/wiki . As we don't have any budget for it, it is very slow. But we would like to extend it to other languages such as Chinese, Korean, Vietnamese, English, and so on, if you and others would collaborate with us. Are you interested in that?

@knagasaki I have offered my help repeatedly, that is still the case. I thought, however, that this is what we are doing here already? @martindholmes the current sprint started November 02/2020, these are all still my original issues, as for budget and timetable s.a., I was also unaware of the release schedule timing.

Thank you for your repeated offers. We're welcome to involve your activities on our Github repositories. And then, I'm very interested in the budget, if possible. Our activities have been conducted by free...

knagasaki commented 3 years ago

Then, it should be encoded by note with appropriate attributes, not by in TEI (but may be encoded in HTML5 for rendering). If you want to see a complex example, I know other materials of Japanese classics.

This is crucial, if ruby in TEI are to be used only in some places and not in others where they would be expected to be used in html5 or by encoders of other East Asian documents, then that needs to be documented in the Guidelines with an example and reason. If you have a Japanese example, of where html5 would use ruby but TEI should use note that would be great, alternatively the example I provided should do as well.

The definition of ruby in the original proposal can be summarized as a type of interlinear glosses which represents pronunciation of the targeted text and can be regarded as a parallel text.

I would like to quote an example from the image below: https://www.iiif.ku-orcas.kansai-u.ac.jp/iiif/2/210464810%252F0057.tif/1050,100,250,1100/488,/0/default.jpg

in a part of the following image: https://www.iiif.ku-orcas.kansai-u.ac.jp/iiif/2/210464810%252F0057.tif/full/2000,/0/default.jpg

The encoded text is below: (Unfortunately, I couldn't find an example including <g> so far.)

         <p type="長歌" xml:id="manyo0131">
            <lb/><ruby xml:id="seg00007">
               <rb xml:id="seg00008">
                  <note place="right" resp="#定家" subtype="摘句" targetEnd="#note0175e" type="合点" xml:id="note00175">〽石見乃海つのヽうらわ</note>
                  石見乃海角乃浦廻乎浦無等人社見良目瀉<note place="right" resp="#定家" type="異文" targetEnd="#note00176e">無</note>无
                  <anchor type="noteEnd" xml:id="note00176e"/>等 <anchor type="noteEnd" xml:id="note0175e"/><note resp="#万葉集" type="割書" xml:id="note00177">一云礒
                     <milestone unit="wbr" />无登</note>
               </rb>
               <rt rend="left" xml:id="seg00009">イハミノウミツノヽウラハ
                  <note place="right" resp="#定家" type="異訓" xml:id="note00178">
                     ヲ
                  </note>
                  ニ
                  <anchor type="noteEnd"/>
                  ウラナシトヒトコソミラメカタナシト
               </rt>
            </ruby>
            <lb/>
            <ruby xml:id="seg00010">
               <rb xml:id="seg00011">
                  人社見良目能咲八師浦者無友縦畫屋師滷者
                  <note resp="#万葉集" type="割書" xml:id="note00179">一云礒者</note>
                  無勒
               </rb>
               <rt rend="left" xml:id="seg00012">
                  ヒトコソミラメヨシヱヤシウラハナクトモヨシヱヤシシカタハナクトモ
               </rt>
            </ruby>
         </p>

Most of the annotations are marked up by <note> because they can not be regarded as ruby in Japanese tradition but as other types of annotation. While there are many types of annotations in East Asian ancient documents, I think it is better to encode most of them via <note>. As <ruby> often occurs in tons of modern Japanese texts with apparent functions, I think it is better to make it an independent element.

knagasaki commented 3 years ago

Then, it should be encoded by note with appropriate attributes, not by in TEI (but may be encoded in HTML5 for rendering). If you want to see a complex example, I know other materials of Japanese classics.

This is crucial, if ruby in TEI are to be used only in some places and not in others where they would be expected to be used in html5 or by encoders of other East Asian documents, then that needs to be documented in the Guidelines with an example and reason. If you have a Japanese example, of where html5 would use ruby but TEI should use note that would be great, alternatively the example I provided should do as well. I happen to disagree that the example above should not be encoded via ruby.

If the example which you provided fits the definition of the <ruby> which we mentioned, I would like to agree to encode it with <ruby>. Otherwise, it might be possible to extend the definition of <ruby> so that <ruby> can encode any types of annotations which embedded to a part of a base text. In that case, it might be utilized also for non-East Asian documents?

knagasaki commented 3 years ago

Anyway, the character can be represented as a character in CJK Unified ideograph with IVS like U+8A55 with U+E0101.

Yes for demonstration purposes I picked a sequence that contains such a character, the use of IDS inside <g> is mentioned in the Guidelines (I wrote that section) so for demo purposes it should suffice. We can of course add a cross-reference to the section in the prose.

Thank you very much for your significant contribution to improve the section! As far as I see, I know the following passage:

Encoders are strongly encouraged to provide IDS for each variant ideograph in the header component of the gaiji module to faciliated greater human and machine readability of rare or unencoded characters

and an example:

<glyph xml:id="U507D-var">
<!-- more properties here -->
 <mapping type="IDS">⿻人為</mapping>
 <mapping type="standard">偽</mapping>
</glyph>

It recommends to use IDS in the header, but I couldn't find recommendation to use IDS in the base text. Actually, it is a little difficult to process an element <g> which directly involves IDS. (Of course it can be treated if it is informed.) I think it is better to use an empty element with a reference to an appropriate <char> in such case suggested in the next example in the Guidelines (probably you wrote).

duncdrum commented 3 years ago

Thank you very much for the nice example. I think it would be great to add it to this repo, as we can discuss a number of issues on it.

About the markup example

In principle, I see no advantages of:

<seg type="ruby">
  <seg type="rb">...
    <note type="rt" subtype="xyz">…</note>
  </seg>
</seg>

over

<ruby>
  <rb> … </rb>
  <rt type="xyz"> ..</rb>
</ruby>

<ruby> can nest and has @type so anything that is possible in the first form, should be possible in the second. Whereas the second is more concise, and a better description of the textual feature in my view.

The typology you use here would make for great suggested values for the use of @type in the guidelines (although you didn't provide a @subtype for the left-side annotations, where an appropriate technical term probably exists.

Some more detailed comments

Note #note00179 is the commentarial half-width script. It's encoding lies outside the scope of ruby but adds another argument to more easily and clearly differentiate between 割書 type notes and rts

Given your example I cannot see how the left side annotations (#seg00012), do not fit the definition a type of interlinear glosses which represents pronunciation of the targeted text and can be regarded as a parallel text. This will be a major source of confusion for encoders who are not primarily working in Japanese, so if TEI is to adopt this practice / typology it needs to be made explicit.

Note #note00178 is an interesting case, I left it as note in my take on the markup, but maybe its markup could be further streamlined.

What is a ruby

However the real crux I believe stems from the desire to quickly address tons of modern Japanese texts with apparent functions. I have a similar desire to address modern Taiwanese texts, both are generally more simple than the examples we are discussing here. This is connected to the question of <ruby> in non-East Asian documents. There is a conflict between trying to tailor the solution to Japanese practices in particular, or to East Asian documents more broadly. In my mind two reason are in favor of adopting a broader definition:

How are ruby treated and defined in other standards
How does TEI as a global standard (all text, any time, any where) traditionally deal questions of universality

On both grounds I lean towards the broader definition, incorporating all East Asian documents, since their history of mutual influence is significant. Phonation is a spurious concept when 漢子 are involved. Think of the Yuan dynasty, which frequently sounded-out Mongolian phonetics using 漢子 in writing. I would argue that both the Manchu example and the one 古今長者錄 we are discussing here contain ruby. If editors for other textual traditions from outside East Asia think they have ruby I m all in favor. Although i have never come across something like it.

knagasaki commented 3 years ago

Thank you very much for the nice example. I think it would be great to add it to this repo, as we can discuss a number of issues on it.

About the markup example

In principle, I see no advantages of:
<seg type="ruby">
  <seg type="rb">...
    <note type="rt" subtype="xyz">…</note>
  </seg>
</seg>
over
<ruby>
  <rb> … </rb>
  <rt type="xyz"> ..</rb>
</ruby>
<ruby> can nest and has @type so anything that is possible in the first form, should be possible in the second. Whereas the second is more concise, and a better description of the textual feature in my view.

The typology you use here would make for great suggested values for the use of @type in the guidelines (although you didn't provide a @subtype for the left-side annotations, where an appropriate technical term probably exists.

Some more detailed comments

Note #note00179 is the commentarial half-width script. It's encoding lies outside the scope of ruby but adds another argument to more easily and clearly differentiate between 割書 type notes and rts

Given your example I cannot see how the left side annotations (#seg00012), do not fit the definition a type of interlinear glosses which represents pronunciation of the targeted text and can be regarded as a parallel text. This will be a major source of confusion for encoders who are not primarily working in Japanese, so if TEI is to adopt this practice / typology it needs to be made explicit.

Actually, ruby which we proposed can be found only in Japanese and very few Korean texts in ancient documents. In Chinese ancient text, similar function which represents pronunciation were indicated by the commentarial half-width script, as you know, like the image below:

https://candra.dhii.jp/iipsrv/iipsrv.fcgi?IIIF=/ongi_pub/1240226/1240226_0102.tif/2219,1018,549,1330/100,/0/default.jpg (This document was written in Chang'an in Tang dynasty and published in Korea in 13th century)

https://candra.dhii.jp/iipsrv/iipsrv.fcgi?IIIF=/nincho/C40-4508-1_01/0037.tif/1566,766,259,909/100,/0/default.jpg (This document is the same text above and published in Japan in 17th century)

The Chinese tradition spread as copied texts in the East Asian world. However, the marginal areas of China culture such as Korean Peninsula and Japanese islands also developed their way of reading the Chinese texts historically as <ruby> and some other punctuation systems.

However, I've heard that in modern textbooks in China ruby is used for indicating pronunciation of each word with pinyin. It is the same usage as the Japanese modern ruby. So, modern usage is necessary and acceptable at least in East Asian countries and maybe outside of them.

Anyway, I think it is not necessary to show an example from ancient document in the Guidelines. (But I will do it in a specialized guidelines).

knagasaki commented 3 years ago

What is a ruby

However the real crux I believe stems from the desire to quickly address tons of modern Japanese texts with apparent functions. I have a similar desire to address modern Taiwanese texts, both are generally more simple than the examples we are discussing here. This is connected to the question of <ruby> in non-East Asian documents. There is a conflict between trying to tailor the solution to Japanese practices in particular, or to East Asian documents more broadly. In my mind two reason are in favor of adopting a broader definition:

How are ruby treated and defined in other standards

How does TEI as a global standard (all text, any time, any where) traditionally deal questions of universality

On both grounds I lean towards the broader definition, incorporating all East Asian documents, since their history of mutual influence is significant. Phonation is a spurious concept when 漢子 are involved. Think of the Yuan dynasty, which frequently sounded-out Mongolian phonetics using 漢子 in writing. I would argue that both the Manchu example and the one 古今長者錄 we are discussing here contain ruby. If editors for other textual traditions from outside East Asia think they have ruby I m all in favor. Although i have never come across something like it.

While I couldn't read Manchu, it might be a case of <ruby>, However, the interlinear glosses in interlinear 古今長者錄 seem not fit the definition of ruby which we mentioned and I think it is better that the glosses are encoded by <note>, because ruby is efficient in the limited definition and the glosses can be utilize by <note> with possibility of a variety of @type. I would like to ask @duncdrum the benefits of using ruby in 古今長者錄 rather than <note> with various @type and @subtype. It seems to me that most of the glosses are useful by use of <note type="原評></note> . The second interlinear gloss seems to be embedded to "一漁父". If so, it can be encoded below:

<lb/>有
<note type="原評" targetEnd="#noteEnd01">豪傑也及于江上淂之<note>
一漁父<anchor xml:id="noteEnd01">
撑船知其意乃渡乃…

However, in the case of <ruby>, due to the nature of dependency on the layout, it would be encoded below:

<lb/>
<ruby><rt>原評豪傑也及于江上淂之</rt>
<rb>有一漁父撑船知其意乃渡</rb></ruby>乃…

In these case, the relationships between the gloss and the targeted text are different, <ruby> can only represent the position of the gloss, but <note> can do the semantic relationship between the targeted text and the gloss. And <note> can provide a characteristic typology of gloss in 古今長者錄 by @type in the existing framework of the Guidelines. If the position of the gloss must be exactly described, <sourceDoc> and other related elements can be used.

duncdrum commented 3 years ago

The benefits are:

greater consistency in the TEI markup of East Asian Documents
greater consistency between TEI and other markup standards
significantly more concise markup when working with whole documents

I do like your approach with note+seg in general. I m actually curious how seg00012 does not fit your definition of ruby? If it fits wouldn't a more targeted encoding be preferable? If you could help me understand why this falls outside of the definition, i might better understand your point about my examples.

The point of introducing a new element in my mind is to more accurately describe the textual features we find, and to make encoding easier. I dislike that the current definition of ruby prevents their use on the examples i ve given here.

I also think that #5 could significantly lower the markup burden, especially of modern cases, so irrespective of the status of other East Asian documents, <layout> should be considered within the context of the current proposal as a central location to define attributes about ruby

In Chinese ancient text, similar function which represents pronunciation were indicated by the commentarial half-width script

It took me 5 tries to find a page that combines commentarial half-width with what i d call ruby. Why would the publishers use different styles on the same page if they both perform identical functions? I m afraid I disagree with regard to ancient Chinese texts (古今長者錄, and the manchu example included) :

15483537-2

source: 董子春秋繁露 seq. 68

knagasaki commented 3 years ago

The benefits are:

greater consistency in the TEI markup of East Asian Documents

greater consistency between TEI and other markup standards

significantly more concise markup when working with whole documents

Thank you for listing the three. They are important and I basically agree that we should aim the benefits. Based on the understanding, the ruby is an exception of consistency in ancient East Asian Documents. As I wrote previously like below,

The Chinese tradition spread as copied texts in the East Asian world. However, the marginal areas of China culture such as Korean Peninsula and Japanese islands also developed their way of reading the Chinese texts historically as and some other punctuation systems.

ruby were not used in China, but used in the marginal areas to understand Chinese texts in the pre-modern age. As the Manchu also a typical marginal area in the Chinese cultural area, I suppose the example you provided might be an appropriate one. It is one of asymmetric relationships in the East Asian culture.

Regarding the consistency with other markup standards, ruby in HTML5 is defined as "in East Asian typography as a guide for pronunciation or to include other annotations. In Japanese, this form of typography is also known as furigana." But I don't know usage in ancient document were considered. And most of the listed examples in the HTML5 documents are pronunciation. However, usages of some TEI elements are not same as HTML such as <span>. The consistency is important but not necessary for sharing good encoding, I think.

I do like your approach with note+seg in general. I m actually curious how seg00012 does not fit your definition of ruby? If it fits wouldn't a more targeted encoding be preferable? If you could help me understand why this falls outside of the definition, i might better understand your point about my examples.

I'm sorry for my misleading. I've just copied and pasted the encoded example from a current TEI-validated text. As the seg00012 has @type=rt , it will be converted into <rt> after the ruby will be included in the Guidelines. I will revise the example after posting this comment.

The point of introducing a new element in my mind is to more accurately describe the textual features we find, and to make encoding easier. I dislike that the current definition of ruby prevents their use on the examples i ve given here.

While I suggested the previous comment comparing usage of <ruby> and <note>, we may have another possibility like below:


<lb/>有
<ruby><rt>原評豪傑也及于江上淂之</rt>
<rb>一漁父</rb></ruby>撑船知其意乃渡乃…

In this case, the relationship between the annotation and the targeted text is represented, but rendering information, that is, the location of the rt characters is lost, then, the function of rendering <ruby> in HTML5 can not be utilized. Which one do you prefer in the three types of encoding?

And then, I would like to know your preferable (or acceptable) definition of ruby.

knagasaki commented 3 years ago

In Chinese ancient text, similar function which represents pronunciation were indicated by the commentarial half-width script

It took me 5 tries to find a page that combines commentarial half-width with what i d call ruby. Why would the publishers use different styles on the same page if they both perform identical functions? I m afraid I disagree with regard to ancient Chinese texts (古今長者錄, and the manchu example included) :

source: 董子春秋繁露 seq. 68

If we focus only the layout, interlinear glosses and emphasis markers seem to be found in the following image of a manuscript. Do you think whether the glosses should be encoded via ruby or not?

https://dutchanglosaxonist.files.wordpress.com/2017/10/dotglosses1.jpg?w=730

duncdrum commented 3 years ago

Yes, both the function of providing (early) dutch readings of the latin words, and the layout choice for presenting it, fits the definition of ruby.

duncdrum commented 3 years ago

I actually prefer your final version,

<lb/>有
<ruby>
  <rt>原評豪傑也及于江上淂之</rt>
  <rb>一漁父</rb>
</ruby>
撑船知其意乃渡乃…

but this version seem similarly acceptable, it should be left to encoders to pick how they interpret the relationship between base and target,

<lb/>
<ruby>
  <rt>原評豪傑也及于江上淂之</rt>
  <rb>有一漁父撑船知其意乃渡</rb>
</ruby>
乃…

knagasaki commented 3 years ago

Yes, both the function of providing (early) dutch readings of the latin words, and the layout choice for presenting it, fits the definition of ruby.

Thank you for reply. Then, ruby is not limited to East Asia, but used in general in pre-modern document. Can you find some examples in non-East Asian modern texts? And then, probably a sort of the Latin case has been encoded in TEI to this date. The use case would be useful for this discussion. Could you know such an example?

knagasaki commented 3 years ago

I actually prefer your final version,
<lb/>有
<ruby>
  <rt>原評豪傑也及于江上淂之</rt>
  <rb>一漁父</rb>
</ruby>
撑船知其意乃渡乃…
but this version seem similarly acceptable, it should be left to encoders to pick how they interpret the relationship between base and target,
<lb/>
<ruby>
  <rt>原評豪傑也及于江上淂之</rt>
  <rb>有一漁父撑船知其意乃渡</rb>
</ruby>
乃…

Thank you for reply on your thought. If you accept both, I think it is better that ruby has some @types to distinguish the relationship such as type="rendering" and type="explanation".

kzhr commented 3 years ago

@duncdrum

Yes, both the function of providing (early) dutch readings of the latin words, and the layout choice for presenting it, fits the definition of ruby.

Am I missing the image of this example? Just curious…

duncdrum commented 3 years ago

@knagasaki I ll try to sum up our very helpful discussion of various examples so far. I think that we should try to bring it to a conclusion, before we can put a final shape on the Guidelines prose to be presented to council.

We agree that ruby warrant their own element because of the complex relationships they can display with the base text. We also agree that while ruby can display highly complex behavior, there are many simple cases that we wish to address in a simple and straightforward manner.

What is still an open question is if ruby in TEI should adopt a narrow functional definition stressing phonation, or a broad definition stressing their layout / appearance.

A narrow definition closely matches e.g. bopomofo and furigana practices in modern documents. Consequently, the guidelines should stress the close link to Japanese instead of East Asia, and give examples of where ruby is to be used in html5 but not in TEI. It excludes ancient Chinese documents, and leaves the interpretation if 漢子 where used to (also) convey phonation to the encoder. If they weren't its not ruby. It also prevents the use of ruby in cases where both commentarial and ruby-like annotations appear in a document.
A broad definition, is inclusive of the narrow definition, but allows the use of ruby in a wider range of historical East Asian and other documents. This should be reflected in the prose of the Guidelines and the examples (we have quite a few by now to choose from). And lends itself well to suggest values for use of @type on ruby, your examples already use a typology that would make a great scholarly contribution here.

I'll try focus our discussion about some of the specific remaining question by posting in the other issues. In both case, I would argue that #5 should be considered for the draft of the submission regardless of broad or narrow definition. If we adopt a broad definition #4 and #10 should also be considered.

duncdrum commented 3 years ago

@kzhr https://dutchanglosaxonist.files.wordpress.com/2017/10/dotglosses1.jpg?w=730 end of post Its a very nice find. I don't have non-East-Asian examples readily available.

knagasaki commented 3 years ago

@knagasaki I ll try to sum up our very helpful discussion of various examples so far. I think that we should try to bring it to a conclusion, before we can put a final shape on the Guidelines prose to be presented to council.

We agree that ruby warrant their own element because of the complex relationships they can display with the base text. We also agree that while ruby can display highly complex behavior, there are many simple cases that we wish to address in a simple and straightforward manner.

What is still an open question is if ruby in TEI should adopt a narrow functional definition stressing phonation, or a broad definition stressing their layout / appearance.

A narrow definition closely matches e.g. bopomofo and furigana practices in modern documents. Consequently, the guidelines should stress the close link to Japanese instead of East Asia, and give examples of where ruby is to be used in html5 but not in TEI. It excludes ancient Chinese documents, and leaves the interpretation if 漢子 where used to (also) convey phonation to the encoder. If they weren't its not ruby. It also prevents the use of ruby in cases where both commentarial and ruby-like annotations appear in a document.

A broad definition, is inclusive of the narrow definition, but allows the use of ruby in a wider range of historical East Asian and other documents. This should be reflected in the prose of the Guidelines and the examples (we have quite a few by now to choose from). And lends itself well to suggest values for use of @type on ruby, your examples already use a typology that would make a great scholarly contribution here.

Thank you for summarizing both. My concern in the broad definition is that ruby should be encoded via the existing elements if the definition will be adopted. Probably the deference between <note> and <ruby> would be only to indicate the position of annotation that the annotated text is close to the targeted text. It is not specialized in East Asia, but in global. If it is useful for TEIers broadly, it might valuable to add <ruby> and some related elements. But it might be redundant for the Guidelines to add a new element family whose functions are already provided.

knagasaki commented 3 years ago

@knagasaki I ll try to sum up our very helpful discussion of various examples so far. I think that we should try to bring it to a conclusion, before we can put a final shape on the Guidelines prose to be presented to council. We agree that ruby warrant their own element because of the complex relationships they can display with the base text. We also agree that while ruby can display highly complex behavior, there are many simple cases that we wish to address in a simple and straightforward manner. What is still an open question is if ruby in TEI should adopt a narrow functional definition stressing phonation, or a broad definition stressing their layout / appearance.

A narrow definition closely matches e.g. bopomofo and furigana practices in modern documents. Consequently, the guidelines should stress the close link to Japanese instead of East Asia, and give examples of where ruby is to be used in html5 but not in TEI. It excludes ancient Chinese documents, and leaves the interpretation if 漢子 where used to (also) convey phonation to the encoder. If they weren't its not ruby. It also prevents the use of ruby in cases where both commentarial and ruby-like annotations appear in a document.

A broad definition, is inclusive of the narrow definition, but allows the use of ruby in a wider range of historical East Asian and other documents. This should be reflected in the prose of the Guidelines and the examples (we have quite a few by now to choose from). And lends itself well to suggest values for use of @type on ruby, your examples already use a typology that would make a great scholarly contribution here.

Thank you for summarizing both. My concern in the broad definition is that ruby should be encoded via the existing elements if the definition will be adopted. Probably the deference between <note> and <ruby> would be only to indicate the position of annotation that the annotated text is close to the targeted text. It is not specialized in East Asia, but in global. If it is useful for TEIers broadly, it might valuable to add <ruby> and some related elements. But it might be redundant for the Guidelines to add a new element family whose functions are already provided.

So, I prefer the narrow definition. As we wrote in the original document, ruby has been historically used in the marginal areas in East Asia and used in entire area now. In the pre-modern Western world, the Latin example I found seems to be fit the narrow one, if it would be regarded as both glosses and a parallel text. While the narrow one requires a little bit strict condition (except only annotation as additional explanation), it seems to be adopted among wider textual cultures.

By the way, the Guidelines are implicitly close to the Western documents from my view. To extend the user community of TEI globally, it is necessary to include various conventions of textual cultures in the world. Establishing the SIG East Asian / Japanese followed by the Indic one was a symbolic matter in the long history of the TEI, I think. As this proposal is the first experience for the SIG, we're preparing for the next proposal to encode characteristic textual elements in most part of East Asian area.

duncdrum commented 3 years ago

fair enough let's see what council says about <note type="ruby"> vs <ruby>

knagasaki commented 3 years ago

fair enough let's see what council says about <note type="ruby"> vs <ruby>

and narrow definition vs broad definition?

martindholmes commented 3 years ago

I don't believe this is ruby: https://dutchanglosaxonist.files.wordpress.com/2017/10/dotglosses1.jpg?w=730 It's an Anglo-Saxon interlinear gloss of a Latin text. We have plenty of ways to do this already, without invoking ruby, surely?

martindholmes commented 3 years ago

Just a clarification: our current schema prescribes one rb followed by one or more rts, rather than the other way around; some of the encodings above have the rt first.

knagasaki commented 3 years ago

Just a clarification: our current schema prescribes one rb followed by one or more rts, rather than the other way around; some of the encodings above have the rt first.

It is better to be able to make order of rt and rb free among ruby, if possible.

martindholmes commented 3 years ago

@knagasaki It's easy to do that, but what's the reason for it? In general, I tend to prefer loose ordering for element content, but in this case it seems to me that the rb is the basis (by definition) and the rt is supplementary, and depends on it. The position of the rt is handled by @place, so the order of elements should have no significance with regard to layout.

knagasaki commented 3 years ago

@knagasaki It's easy to do that, but what's the reason for it? In general, I tend to prefer loose ordering for element content, but in this case it seems to me that the rb is the basis (by definition) and the rt is supplementary, and depends on it. The position of the rt is handled by @place, so the order of elements should have no significance with regard to layout.

For processing, it is better that rb is the first element within ruby. However, as it is definitely natural that a ruby text on base text for our convention, it is easy to see by human eyes for people who are familiar with ruby, I suppose.

martindholmes commented 3 years ago

@knagasaki Shall I leave the schema as it is (with rb always first), and then we can see if people need to use the alternative sequence in their work? For something like this, I feel that the order of elements, if it's variable, should have significance, but I don't see any significance here.

knagasaki commented 3 years ago

@knagasaki Shall I leave the schema as it is (with rb always first), and then we can see if people need to use the alternative sequence in their work? For something like this, I feel that the order of elements, if it's variable, should have significance, but I don't see any significance here.

I see. I can not show the evidence for it now. So, if a certain number of people will request it, I will suggest it again in the future.

knagasaki commented 3 years ago

My colleague gave me some examples of ruby in textbooks in China! https://mp.weixin.qq.com/s?__biz=Mzg4OTA5MDg5OQ==&mid=100000003&idx=1&sn=82285bbb1f87da0f681451b2c4db27fb&scene=19

duncdrum commented 3 years ago

I don't believe this is ruby: https://dutchanglosaxonist.files.wordpress.com/2017/10/dotglosses1.jpg?w=730 It's an Anglo-Saxon interlinear gloss of a Latin text. We have plenty of ways to do this already, without invoking ruby, surely?

Nobody is saying you need Ruby because of the Anglo Saxon example. The question is if you have ruby, might it not be better suited to the task, than the plenty of ways you already have. Kind of the hall mark of a global standard, that concepts are applicable across local context, no?

martindholmes commented 3 years ago

@duncdrum I think @knagasaki makes a good point about the fact that existing communities already have encoding norms. I see no reason why you shouldn't use ruby tags for this if you want, but I don't think we should recommend it or exemplify it, just to introduce confusion into communities completely unfamiliar with ruby.

duncdrum commented 3 years ago

Again, nobody said the anglo-saxon example should go into the Guidelines, its not in this repo either. I m all for addressing the needs of existing communities, like editors working on Chinese documents where they already have to use <html:ruby> to render documents like the ones I added here.

But from the looks of it there will be a difference in semantics of ruby in html and tei.

martindholmes / rubyForTEI