iljackb / Mixtepec_Mixtec

Mostly XML (TEI) markup of Mixtepec-Mixtec Language resources
3 stars 1 forks source link

Add @type for other <span> types also (e.g. "inflected", "compound"?) #70

Open iljackb opened 5 years ago

iljackb commented 5 years ago

Given that issue #69 (which has been agreed upon) will add the annotation convention that requires the 's which translate a full sentence will be given the label @type='sentence'...

               <seg xml:id="d1e105" n="2" xml:lang="mix" resp="#TS" type="S">
                  <w xml:id="d1e106">Sara</w>
                  <w xml:id="d1e108">nuu</w>    <!-- nuu kue - we got down/off [núu] or [núú] -->
                  <w xml:id="d1e111">kue</w>
                  <pc>,</pc>
......
                  <pc>.</pc>
               </seg>
               <spanGrp type="translation">
....

                  <span target="#d1e108 #d1e111" xml:lang="en" type="inflected">we got out</span>
                  <span target="#d1e108 #d1e111" xml:lang="es" type="inflected">salimos</span>
....
               </spanGrp>

Would it also be beneficial to then create further typological distinctions to distinguish such content as: "inflection", "multiword-expression", "compound", "phrase" (in case I translated a sequence of items for semantic reasons)...

For inflected forms, and multi-word-expressions especially, this would be beneficial since I have not yet annotated for grammar. It would be unnecessary if I had grammar annotations...

Major downside would be the significant extra work needed to go over all annotated documents and add this..

iljackb commented 5 years ago

This would allow for the creation of separate xsl templates that output inflected forms into specific TEI dictionary output that matches my established practice for collecting paradigms; the translations can go into gloss within the paradigm, rather than in the translation (which is only desirable for lemmata): e.g.

               <form type="inflected">
                  <orth xml:lang="mix">kuácha</orth>
                  <pron xml:lang="mix" notation="ipa">kwátʃáà</pron>
                  <gramGrp>
                     <per>1</per>
                     <number>sg</number>
                  </gramGrp>
                  <gloss xml:lang="en">I am happy</gloss>
                  <gloss xml:lang="es">estoy feliz</gloss>
               </form>
iljackb commented 5 years ago

I just thought of a potential complication. Which is where the form is inflected but doesn't have two pointers. Some forms (such as the example given above "kuácha") are inflected but are marked by a vowel change and don't have an enclitic.

Thus in these, it would be inaccurate linguistically to not mark them as inflections and it would go against the utility of automatic extraction and insertion into a paradigm dictionary document.

So given this, I would have to have a condition checking to see if the inflected form has more than one token first.

iljackb commented 5 years ago

Add all the allowed values to the ODD so that they pop up as suggested values when annotating

iljackb commented 5 years ago

Move @type="phrase | compound | MWE" to ; keep @type="inflected" on

iljackb commented 5 years ago

What is the difference between a phrase and a multi-word expression!? e.g. 'to hit the road'

Mixtec ki'in ichi lit "grab the road".

This is a phrase, (it is seen in the vocab inflected) and it is a multi-word expression...

Mixtec: kuun ta savi fall deg rain a storm/to storm