iljackb / Mixtepec_Mixtec

Mostly XML (TEI) markup of Mixtepec-Mixtec Language resources
3 stars 1 forks source link

Removing <c>'s from early Praat transcription output? #89

Closed iljackb closed 4 years ago

iljackb commented 4 years ago

In early Praat transcriptions the annotations contained different tiers for each vowels, semi-vowels/glides/nasals, consonants, tones..(as well as the orthographic form and a gloss). In the phonetic transcriptions in TEI these were output as <c> and the tones are output as <c function="tone">↗</c>

            <u xml:id="d1e36" n="1" start="0" end="0.69" xml:lang="mix">
               <seg xml:lang="mix" xml:id="d1e37" function="utterance" notation="orth">
                  <w xml:id="d1e38" synch="#T1">kuilu</w>
               </seg>
               <seg xml:lang="mix"  xml:id="d1e40" function="utterance" notation="ipa">
                  <w xml:id="d1e41" synch="#T1">
                     <c>k</c>
                     <c>w</c>
                     <c>ɪ</c>
                     <c function="tone">↗</c>
                     <c>l</c>
                     <c>i</c>
                     <c function="tone">˧</c>
                  </w>
               </seg>
            </u>

The new ones in which I only transcribe the full phonetic word are output as follows:

            <u n="1" xml:id="d23e0" start="2.04" end="3.77">
               <seg xml:lang="mix" notation="orth" xml:id="T-seg-orth-2.04">
                  <w synch="#T2.56" xml:id="T-orth2.56">naá</w>
               </seg>
               <seg xml:lang="mix" notation="ipa" xml:id="T-seg-pron-2.04" sameAs="#T-orth2.56">
                  <w synch="#T2.56" xml:id="T-pron2.56" sameAs="#T-orth2.56">na˩a↗</w>
               </seg>
            </u>

The <c>'s make the phonetic content nearly unsearchable and the inconsistant encodings make for a less harmonious and usable data set.

Other than the issue of marking and annotating tones when they mark a specific linguistic feature (see issue #88 discusses the tagging of the phonetic components, and proposes changing <c> to <m> for tones), unless I add the times to them directly, there is absolutely no functional benefit of keeping the <c>'s.

In the case that I want to perform systematic study of these contents, this can be best done in Praat without the TEI.

So @Laurent do you approve of this change?

laurentromary commented 4 years ago

Yes, sure!