iljackb / Mixtepec_Mixtec

Mostly XML (TEI) markup of Mixtepec-Mixtec Language resources
3 stars 1 forks source link

Alternative annotation options #91

Closed iljackb closed 4 years ago

iljackb commented 4 years ago

Up until now I've kept the annotations (grammatical, semantic, IGT) and translations separate in different <spanGrp>'s, however this is highly inefficient spatially and in terms of time needed to annotate as the more annotations that are carried out, the further away from the annotated language content the <spanGrp> becomes.

As per a comment in the dissertation, I proposed to do away with the single feature annotations: e.g.

            <spanGrp type="gram">
               <span type="sentence" target="#d1e113" ana="#DECL"/>
               <span type="pos" target="#d1e114" ana="#V"/>
               <span type="transitivity" target="#d1e114" ana="#INTRANS"/>
               <span type="tense" target="#d1e225" ana="#INCOMPL"/>
               <span type="person" target="#d1e120" ana="#1PERS"/>
               <span type="number" target="#d1e120" ana="#SG"/>
               <span type="pos" target="#d1e116" ana="#ADV"/>
            </spanGrp>

....in favor of combined annotations for a given target, e.g. something like (note: the value and possibly the use of @type is still to be determined):

            <spanGrp type="gram">
               <span type="sentence" target="#d1e113" ana="#DECL"/>
               <span target="#d1e114" ana="#V #INTRANS #INCOMPL"/>
               <span target="#d1e120" ana="#1PERS #SG"/>
               <span target="#d1e116" ana="#ADV"/>
            </spanGrp>

However, even this is cumbersome, especially when the sentence annotated is long (as the <spanGrp type="translation"> with the translations which precedes the one with grammar precedes the ). e.g.

              <seg xml:id="L104-01-03" type="S" xml:lang="mix">
                  <w xml:id="d1e200">Ku'un</w>
                  <w xml:id="d1e202">ti</w>
                  <w xml:id="d1e204">mancha</w>
                  <w xml:id="d1e206">nuu</w>
                  <w xml:id="d1e208">kantu'u</w>
                  <w xml:id="d1e210">staa</w>
                  <pc>.</pc>
               </seg>
               <spanGrp type="translation">
                  <span type="sentence" target="#L104-01-03" xml:lang="en">It (animal) goes to where the tortillas are.</span>
                  <span type="sentence" target="#L104-01-03" xml:lang="es" cert="medium">Va a dónde estan las torillas.</span>
                  <span target="#d1e200 #d1e202" xml:lang="en" type="inflected">it goes</span>
                  <span target="#d1e200 #d1e202" xml:lang="en" type="inflected">it is going</span>
                  <span target="#d1e200 #d1e202" xml:lang="es" type="inflected">va</span>
                  <span target="#d1e204" xml:lang="en">up to</span>
                  <span target="#d1e204" xml:lang="es">hasta</span>
                  <span target="#d1e206" xml:lang="en">where</span>
                  <span target="#d1e206" xml:lang="es">dónde</span>
                  <span target="#d1e208" xml:lang="en" type="inflected">are sitting</span>
                  <span target="#d1e208" xml:lang="es" type="inflected">estan</span>
                  <span target="#d1e210" xml:lang="en">tortilla</span>
                  <span target="#d1e210" xml:lang="es">tortilla</span>
               </spanGrp>
               <spanGrp type="gram">
                  <span type="sentence" target="#L104-01-03" ana="#DECL"/>
                  <span target="#d1e200" ana="#V #INTRANS"/>
                  <span target="#d1e202" ana="#ENCLTC #3PERS #SG #ANML"/>
                  <span target="#d1e204" ana="#ADPOS"/>
                  <span target="#d1e206" ana="#ADPOS"/>
                  <span target="#d1e208" ana="#V #INTRANS"/>
                  <span target="#d1e210" ana="#N"/>
               </spanGrp>

Instead I think it would be better to combine the two into a single <spanGrp>, which could be done in one of two ways:

(1) To just place the same grammatical <span>'s in the same <spanGrp> along with the translation spans (but they are separate spans), e.g.

            <spanGrp type="translation"><!-- THIS @type WOULD NEED TO BE CHANGED!-->
               <span type="sentence" target="#L104-01-03" xml:lang="en">It (animal) goes to where the tortillas are.</span>
               <span type="sentence" target="#L104-01-03" xml:lang="es" cert="medium">Va a dónde estan las torillas.</span>
               <span target="#L104-01-03" ana="#DECL"/>
               <span target="#d1e200 #d1e202" xml:lang="en" type="inflected">it goes</span>
               <span target="#d1e200 #d1e202" xml:lang="en" type="inflected">it is going</span>
               <span target="#d1e200 #d1e202" xml:lang="es" type="inflected">va</span>
               <span target="#d1e200" ana="#V #INTRANS"/>
               <span target="#d1e202" ana="#ENCLTC #3PERS #SG #ANML"/>
               <span target="#d1e204" xml:lang="en">up to</span>
               <span target="#d1e204" xml:lang="es">hasta</span>
               <span target="#d1e204" ana="#ADPOS"/>
               <span target="#d1e206" xml:lang="en">where</span>
               <span target="#d1e206" xml:lang="es">dónde</span>
               <span target="#d1e206" ana="#ADPOS"/>
               <span target="#d1e208" xml:lang="en" type="inflected">are sitting</span>
               <span target="#d1e208" xml:lang="es" type="inflected">estan</span>
               <span target="#d1e208" ana="#V #INTRANS"/>
               <span target="#d1e210" xml:lang="en">tortilla</span>
               <span target="#d1e210" xml:lang="es">tortilla</span>
               <span target="#d1e210" ana="#N"/>
            </spanGrp>

(2) ...or it could be done by adding @ana to the pre-existing spans with the translations, however this would pose two issues:

  1. it would be necessary that there be spans added with only the @ana for lexical items like the topic marker that have no translation but do have a grammatical function

  2. I'd need to decide whether to duplicate the @ana on both the Spanish and English (which is redundant and takes double the time) as there may be only one of the languages so a search would have to search both

e.g.

            <spanGrp type="translation">
               <span type="sentence" target="#L104-01-03" xml:lang="en" ana="#DECL">It (animal) goes to where the tortillas are.</span>
               <span type="sentence" target="#L104-01-03" xml:lang="es" cert="medium" ana="#DECL">Va a dónde estan las torillas.</span>
               <span target="#d1e200 #d1e202" xml:lang="en" type="inflected" ana="#V #INTRANS #INCOMPL">it goes</span>
               <span target="#d1e200 #d1e202" xml:lang="en" type="inflected" ana="#V #INTRANS #INCOMPL">it is going</span>
               <span target="#d1e200 #d1e202" xml:lang="es" type="inflected" ana="#V #INTRANS #INCOMPL">va</span>
               <span target="#d1e202" ana="#ENCLTC #3PERS #SG #ANML"/>
               <span target="#d1e204" xml:lang="en" ana="#ADPOS">up to</span>
               <span target="#d1e204" xml:lang="es" ana="#ADPOS">hasta</span>
               <span target="#d1e206" xml:lang="en" ana="#ADPOS">where</span>
               <span target="#d1e206" xml:lang="es" ana="#ADPOS">dónde</span>
               <span target="#d1e208" xml:lang="en" type="inflected" ana="#V #INTRANS #INCOMPL">are sitting</span>
               <span target="#d1e208" xml:lang="es" type="inflected" ana="#V #INTRANS #INCOMPL">estan</span>
               <span target="#d1e210" xml:lang="en" ana="#N">tortilla</span>
               <span target="#d1e210" xml:lang="es" ana="#N">tortilla</span>
            </spanGrp>

In either case an issue to decide would be:

Because I would have to add spans for the grammatical items which do not currently have a span in the translation section, I would still have to add this, given this fact, it would not save me much time to go with option (2). Given also that it is better practice to separate different contents, it would be better to go with option (1) which even though they are in the same <spanGrp>, the grammatical and translation contents would be separated. Thus I prefer option (1)

And, given the logic of this, it would also make sense to include semantic annotations and interlinear glossed text annotations in the same <spanGrp> as well... I can add examples of each in the comments...

@laurent, What do you think?

iljackb commented 4 years ago

Reviewing the whole idea I think it makes a lot more sense to put all the annotations in a single <spanGrp> which could be given the@type="annotations" and possibly the@resp="#JB", this would be much easier to annotate and would keep everything in a single package!

So I am for sure going to do this!! Here is a proto-typical example with the grammar annotations with the translations:

               <seg xml:id="L146-04-01" type="S" xml:lang="mix">
                  <w xml:id="d1e344">Sara</w>
                  <w xml:id="d1e346">kunkuaa</w> <!-- darken, the sun is setting (tense)? -->
                  <w xml:id="d1e348">ra</w>
                  <w xml:id="d1e350">saa</w>
                  <w xml:id="d1e352">luu</w>
                  <w xml:id="d1e354">ka</w>
                  <w xml:id="d1e356">ntava</w>
                  <w xml:id="d1e358">ti</w>
                  <w xml:id="d1e360">kua'an</w>
                  <w xml:id="d1e362">ti</w>
                  <pc>.</pc>
               </seg>
               <spanGrp type="annotations">
                  <span type="sentence" target="#L146-04-01" xml:lang="en">And then it was getting dark and the little bird flew and went away.</span>
                  <span type="sentence" target="#L146-04-01" xml:lang="es">Y entonces oscurecía y el pájaro pequeño voló y se fue.</span>
                  <span target="#d1e344" xml:lang="en">then</span>
                  <span target="#d1e344" xml:lang="es">y entonces</span>
                  <span target="#d1e344" ana="#ADV"/>
                  <span target="#d1e346" xml:lang="en" type="inflected">get dark</span>
                  <span target="#d1e346" xml:lang="es" cert="medium" type="inflected">oscureció</span>
                  <span target="#d1e346" ana="#V #INTRANS #COMPL"/>
                  <span target="#d1e348" ana="#CONJ"/>
                  <span target="#d1e350" xml:lang="en">bird</span>
                  <span target="#d1e350" xml:lang="es">pájaro</span>
                  <span target="#d1e352" ana="#N"/>
                  <span target="#d1e352" xml:lang="en">small</span>
                  <span target="#d1e352" xml:lang="es">pequeño</span>
                  <span target="#d1e354" ana="#ADJ"/>
                  <span target="#d1e356 #d1e358" xml:lang="en" type="inflected">it (animal) flew</span>
                  <span target="#d1e356 #d1e358" xml:lang="es" type="inflected">él (animal) voló</span>
                  <span target="#d1e360 #d1e362" xml:lang="en" type="inflected">it (animal) went away</span>
                  <span target="#d1e360 #d1e362" xml:lang="es" type="inflected">él (animal) se fue</span>
                  <span target="#d1e360" ana="#V #INTRANS #COMPL"/>
                  <span target="#d1e362" ana="#CLTC #ANML"/>
               </spanGrp>

The only thing to decide now is:

  1. Whether to give a label to the spans with the grammar annotations?

    Thus far I have not used any label on the spans annotating grammar except for when they annotate the sentence, in which case I have @type="sentence"

  2. Whether to combine the interlinear glossed texts with the grammar (e.g. in the element value)

    Thus far in my sample encodings, wherever I apply a grammar annotation, is also identical to the span to which I will also apply an IGT annotation, thus far I've been doing them separately: e.g.

    
               <spanGrp type="annotation">
                  .....
                  <span target="#d1e200" ana="#V #INTRANS #INCOMPL"/>
                  <span type="igt" target="#d1e200">go</span>
    
                  <span target="#d1e202" ana="#ENCLTC #3PERS #SG #ANML"/>
                  <span type="igt" target="#d1e202">3sg.anml</span>
    
                  <span target="#d1e204" xml:lang="en">up to</span>
                  <span target="#d1e204" xml:lang="es">hasta</span>
                  <span target="#d1e204" ana="#ADPOS"/>
                  <span type="igt" target="#d1e204">up.to</span>
    
                  <span target="#d1e206" xml:lang="en">where</span>
                  <span target="#d1e206" xml:lang="es">dónde</span>
                  <span target="#d1e206" ana="#ADPOS"/>
                  <span type="igt" target="#d1e206">face</span>
    
                  <span target="#d1e208" xml:lang="en" type="inflected">are sitting</span>
                  <span target="#d1e208" xml:lang="es" type="inflected">están</span>
                  <span target="#d1e208" ana="#V #INTRANS #INCOMPL"/>
                  <span type="igt" target="#d1e208">sit</span>
    
                  <span target="#d1e210" xml:lang="en">tortilla</span>
                  <span target="#d1e210" xml:lang="es">tortilla</span>
                  <span target="#d1e210" ana="#N"/>
                  <span type="igt" target="#d1e210">tortilla</span>
               </spanGrp>       
However given that it is (at least at this point in the test runs) a 1 to 1 need, it may be worth considering combining the two but the questions would be:

1. What to label the @type of the `<span>`? 

   a) type="gram" (_this my preference except for the question it raises in terms of annotating semantic-syntactic interface features, labeling it "gram" puts it in the syntactic category..._)

...

   b) nothing

...

2. How to identify the value as IGT?

   Here are two options:

   a) Don't explicitly label it anywhere, just let the project documentation and data retrieval scripts know that the IGT value for the given form is the value of the span in which there are grammatical annotations

3sg.anml


   b) Use `<gloss type="igt">` inside the value of `<span>`:

3sg.anml



@laurent, do you have any thoughts?
iljackb commented 4 years ago

Closing the issue, see issue #93 for remaining question...