iljackb / Mixtepec_Mixtec

Mostly XML (TEI) markup of Mixtepec-Mixtec Language resources
3 stars 1 forks source link

Add/use @lemma to <w> tokens in corpus #114

Open iljackb opened 2 years ago

iljackb commented 2 years ago

This will greatly enhance the content of the corpus however major decisions have to be made about what form to reference as the lemma. Given the homographs due to tone (and lack of representation thereof in orthography adopted), this would probably require tone diacritics to be used as minimal distinguishing markers to be able to have entirely unique forms in the @lemma.

More study and planning needed.