iljackb / Mixtepec_Mixtec

Mostly XML (TEI) markup of Mixtepec-Mixtec Language resources
3 stars 1 forks source link

Annotating tone #92

Open iljackb opened 4 years ago

iljackb commented 4 years ago

The way that I annotate by default is to tag the orthography. Given that there are many items that in Mixtec don't explicitly mark certain features, the annotations are underspecific as to what is expressing the given feature, eg. in the example below the verb "sketa" is actually present tense and 1sg which don't show up in the orthography, but the entire form is just tagged for those features:

            <u who="#TS" xml:id="d1e112" n="2" start="1.48" end="2.98" xml:lang="mix">
               <seg xml:lang="mix" xml:id="d1e113" notation="orth" type="S">
                  <w xml:id="d1e114" synch="#T14">sketa</w>
                  <w xml:id="d1e116" synch="#T19">ntikii</w>
               </seg>
               ......
            </u>
            <spanGrp type="annotations">
                ....
               <span type="translation" target="#d1e114" xml:lang="en" ana="#INFL">I run</span>
               <span type="translation" target="#d1e114" xml:lang="es" ana="#INFL">corro</span>
               <span type="gram" target="#d1e114" ana="#V #INTRANS #INCOMPL #1PERS #SG"/>
               .........
            </spanGrp>

If however there is a phonetic transcription included, I tag both the orthographic forms (as above) as well as explicitly tagging the tone contours (encoded as <m> with @xml:id's), which specifically labels the linguistic feature.

            <u who="#TS" xml:id="d1e112" n="2" start="1.48" end="2.98" xml:lang="mix">
               <seg xml:lang="mix" xml:id="d1e113" notation="orth" type="S">
                  <w xml:id="d1e114" synch="#T14">sketa</w>
                  <w xml:id="d1e116" synch="#T19">ntikii</w>
               </seg>
               <seg xml:lang="mix" xml:id="d1e118" notation="ipa" type="S" sameAs="#d1e113">
                  <w xml:id="d1e119" synch="#T14" sameAs="#d1e114">skɛ<m xml:id="d1e225">˥</m>t̪a<m xml:id="d1e120">↘</m></w>
                  <w xml:id="d1e132" synch="#T19" sameAs="#d1e116">nd̪i↘kiː↘↗ꜛ</w>
               </seg>
            </u>
            <spanGrp type="annotations">
                 ....
               <span type="translation" target="#d1e114" xml:lang="en" ana="#INFL">I run</span>
               <span type="translation" target="#d1e114" xml:lang="es" ana="#INFL">corro</span>
               <span type="gram" target="#d1e114" ana="#V #INTRANS #INCOMPL #1PERS #SG"/>
               <span type="gram" target="#d1e125" ana="#INCOMPL"/>
               <span type="gram" target="#d1e120" ana="#1PERS #SG"/>
                 ....
            </spanGrp>

However, I'm not sure what value of <span @type> to give it (currently labeling it "gram" the same as the general grammatical annotations, but I'm wondering if I should call it "tone" or something so that a retrieval script can just look for the presence of a <span @type> value rather that looking at whether the target is a <m> which is an ancestor of //seg[@notation='ipa']..

@Laurent, what do you think?

iljackb commented 4 years ago

solution is to use <span type="gram" @subtype>, this requires a schema alteration and for <span> to be added to att.typed.

I am thinking that there should be at least two possible values of @subtype, the first "tone" (for the case discussed above in this issue) and the other possibly "morph" for when pointing to a morphological unit on an inflected, or maybe derived form.

Here is an example showing both uses of @subtype. to tag:

  1. the presence of the future/potentive prefix "kun-" (which is realized phonetically as "ũː↗↘") in front of the verb, but which is only tagged in the phonetic transcription (annotated below as: <span type="gram" subtype="morph" target="#d1e157" ana="#FUT"/>): and
  2. The presence of the tone inflection marking 1st person singular on the verb, which isn't marked in the orthography, annotated below as <span type="gram" subtype="tone" target="#d1e172" ana="#1PERS #SG"/>:
               <seg xml:lang="mix" xml:id="d1e41" notation="orth" type="phrase">
                  <w xml:id="d1e42" synch="#T2">kunkanta</w>
               </seg>
               <seg xml:lang="mix" xml:id="d1e46" notation="ipa" type="phrase" sameAs="#d1e41">
                  <w xml:id="d1e47" synch="#T1" sameAs="#d1e42"><m xml:id="d1e157">ũː↗↘</m>k̬a˩nd̪a<m xml:id="d1e172">˩</m></w>
               </seg>
            </u>
            <spanGrp type="annotations">
               <span type="translation" target="#d1e42" xml:lang="en" ana="#INFL">I will jump</span>
               <span type="translation" target="#d1e42" xml:lang="es" ana="#INFL">saltaré</span>
               <span type="translation" target="#d1e42" xml:lang="es" ana="#INFL">voy a saltar</span>
               <span type="gram" target="#d1e42" ana="#V #INTRANS #FUT #1PERS #SG">
                  <gloss type="igt">fut- jump\1s</gloss>
               </span>
               <span type="gram" subtype="morph" target="#d1e157" ana="#FUT"/>
               <span type="gram" subtype="tone" target="#d1e172" ana="#1PERS #SG"/>
            </spanGrp>

Note that (in relation to issue #93 ), the <gloss type="igt"> will still only be placed in the <span>'s annotating the orthographic content