iljackb / Mixtepec_Mixtec

Mostly XML (TEI) markup of Mixtepec-Mixtec Language resources
3 stars 1 forks source link

corpus2TeiDict-link-span-test-full-docs.xsl (extract vocab from corpus to TEI dictionary) development #75

Open iljackb opened 5 years ago

iljackb commented 5 years ago

Currently working.

Remaining enhancements to be made include:

iljackb commented 5 years ago

Current error producing merging of "phrase" and "inflected" in same entries:

<entry>
            <form type="inflected">
               <orth xml:lang="mix">tata</orth>
               <gramGrp>
                  <pos/>
                  <per/>
                  <number/>
               </gramGrp>
               <gloss xml:lang="en">seed</gloss>
               <gloss xml:lang="en">my seeds</gloss>
               <gloss xml:lang="en">let's insert my seeds</gloss>
               <gloss xml:lang="es">semilla</gloss>
               <gloss xml:lang="es">mis semillas</gloss>
               <gloss xml:lang="es">insertamos mis semillas</gloss>
            </form>
            <form type="phrase">
               <orth xml:lang="mix">tata</orth>
            </form>
iljackb commented 5 years ago

also, why is this:

                  <w xml:id="d1e609">tata</w>
                  <w xml:id="d1e611">yu</w>
                  <span target="#d1e609 #d1e611" xml:lang="en" type="inflected">my seeds</span>
                  <span target="#d1e609 #d1e611" xml:lang="es" type="inflected">mis semillas</span>

producing:

 <entry>
            <form type="phrase">
               <orth xml:lang="mix">tata yu</orth>
            </form>

it should be:

 <entry>
            <form type="inflected">
               <orth xml:lang="mix">tata yu</orth>
            </form>
iljackb commented 5 years ago

Currently where an annotated item has more than one (n) components, it creates (n) entries for the same item (e.g. two entries for "nchi'chi nchu'a")

I think this is due to the entry point of:

                    `<xsl:for-each select="$readDoc/descendant::w[. = current()]">`

One option to deal with this is to have suplimentary stylsheet associated with scenario to remove duplicate entries..

laurentromary commented 5 years ago

Have you checked all annotated information (e.g. that @xml:lang is tested and not present)