Open iljackb opened 5 years ago
Another erroneous byproduct of this is that in the merged TEI dictionary produced, there are 10 entries for "kañuu", but it should only have 2 as per the source document, However, unlike the problem above, there are not multiple occurrences of the $target values for either occurrence of "kañuu", e.g. "d1e545", and "d1e555" do not occur in any longer sequence.. So I'm not sure why this is occurring.
<item>
<graphic url="Aves-35.png"/>
<w xml:id="d1e545" xml:lang="mix">
<w xml:id="d1e546">kañuu</w>
</w>
<w xml:id="d1e548" xml:lang="es">
<w xml:id="d1e549">codorniz</w>
<w xml:id="d1e551">arlequín</w>
</w>
<linkGrp type="translation">
<link target="#1e545 #d1e548"/>
</linkGrp>
<spanGrp type="translation">
<span target="#1e545" xml:lang="en">Montezuma Quail</span>
</spanGrp>
<spanGrp type="semantics">
<span type="sense"
target="#1e545"
corresp="https://www.wikidata.org/wiki/Q1093509"/>
<span type="sense"
target="#1e545"
corresp="http://dbpedia.org/resource/Montezuma_quail"/>
.....
</spanGrp>
</item>
<item>
<graphic url="Aves-36.png"/>
<w xml:id="d1e555" xml:lang="mix">
<w xml:id="d1e556">kañuu</w>
</w>
<w xml:id="d1e558" xml:lang="es">
<w xml:id="d1e559">codorniz</w>
<w xml:id="d1e561">cotuí</w>
</w>
<linkGrp type="translation">
<link target="#d1e555 #d1e558"/>
</linkGrp>
<spanGrp type="translation">
<span target="#d1e555" xml:lang="en">Northern Bobwhite</span>
</spanGrp>
<spanGrp type="semantics">
<span type="sense"
target="#d1e555"
corresp="https://www.wikidata.org/wiki/Q142651"/>
<span type="sense"
target="#d1e555"
corresp="http://dbpedia.org/resource/Northern_bobwhite"/>
.....
</spanGrp>
The thing that is working correctly is that in 2 of the 10, the distinction is correctly maintained between the different birds that in (though I would eventually merge them into a single entry).
<entry xml:id="Northern_bobwhite">
<form type="lemma">
<orth xml:lang="mix">kañuu</orth>
<pron xml:lang="mix" notation="ipa"/>
</form>
<gramGrp>
<pos>noun</pos>
</gramGrp>
<sense corresp="https://www.wikidata.org/wiki/Q142651 http://dbpedia.org/resource/Northern_bobwhite">
<usg type="domain" corresp="http://dbpedia.org/resource/Animal">Animal</usg>
<usg type="domain" corresp="http://dbpedia.org/resource/Bird">Bird</usg>
<xr type="hyponymOf">
<ref corresp="#bird" xml:lang="mix">saa</ref>
<ref type="sense" corresp="http://dbpedia.org/resource/Bird"/>
</xr>
<cit type="translation">
<form>
<orth xml:lang="en">Northern Bobwhite</orth>
</form>
</cit>
<cit type="translation">
<form>
<orth xml:lang="es">codorniz cotuí</orth>
</form>
</cit>
</sense>
</entry>
Another problem is a discord in the:
This is strange because the merge script should just merge the single files...
In this stylesheet, I take the's in the document (testing on /Aves.xml) and create TEI dictionary entries. The test document is bird names. The annotations being transferred to the dictionary are:
<spanGrp type="translation">
);<linkGrp type="translation">
) pointing to a<w xml:lang="es">
;<spanGrp type="translation">
);Because some birds have multiple names in Mixtec, and thus may require the
<span>
and<link>
pointers to contain more than one pointer: e.g.I have to use
@contains
in defining the key data categories: e.g.I define the key variable of $target as follows:
<xsl:variable name="target" as="xs:string" select="concat('#',$wID)"/>
However, this leads to the problem of false matches:
Many items such as:
End up with incorrectly merged entries because the @xml:id ("d1e145") of the target for a
<w>
(i.e. "litsi"):Is also incorrectly matched when the script sees id's later in the script which have ("d1e145") in their id strings, e.g.
and
Thus producing incorrect TEI dictionary entries such as (note the only correct bird name for this should be "American Kestral":
So this is an issue that I understand the problem but can find the right way to make the rule that the script treats $target only as a complete string (I couldn't figure out how to distinguish that the end of the string should be the end.
I of course know regex '$' is probably what I need but I don't know where to put it and how to combine it with what I have...