zumult-org / zumultapi

1 stars 0 forks source link

Customize ISO/TEI scheme for ZuMult #118

Closed EleFri closed 1 year ago

EleFri commented 1 year ago

An empty w-element occurs in the transcript FOLK_E_00442_SE_01_T_01. That shouldn't be allowed. @Thomas: is that correct?

berndmoos commented 1 year ago

It is right that it shouldn't be allowed. However, there is no way of controlling it through the XML schema. The w-element is defined as mixed content (so words can have time anchors at any place):

  <xs:element name="w">
    <xs:complexType mixed="true">
      <xs:sequence>
        <xs:element minOccurs="0" maxOccurs="unbounded" ref="tei:anchor"/>
      </xs:sequence>
      <xs:attribute ref="xml:id" use="required"/>
      <xs:attribute name="type" type="xs:string"/>
      <xs:attribute name="norm" type="xs:string"/>
      <xs:attribute name="lemma" type="xs:string"/>
      <xs:attribute name="pos" type="xs:string"/>
      <xs:attribute name="phon" type="xs:string"/>
    </xs:complexType>
  </xs:element>

Unfortunately, the text part of mixed content cannot be constrained (https://docstore.mik.ua/orelly/xml/schema/ch07_05.htm), so there is no way to tell the schema that <w> must not be empty. The same is true for the FLN schema, so no way either to control this in the source FLNs. This would have to be checked in another way (e.g. through RelaxNG, through a script).

EleFri commented 1 year ago

Ok, than we should do this using a java script. A corresponding issue was created in the DGD code repository:

https://gitlab.ids-mannheim.de/pb-muendliche-korpora/dgd-code/-/issues/54