Closed EleFri closed 1 year ago
It is right that it shouldn't be allowed. However, there is no way of controlling it through the XML schema. The w-element is defined as mixed content (so words can have time anchors at any place):
<xs:element name="w">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="tei:anchor"/>
</xs:sequence>
<xs:attribute ref="xml:id" use="required"/>
<xs:attribute name="type" type="xs:string"/>
<xs:attribute name="norm" type="xs:string"/>
<xs:attribute name="lemma" type="xs:string"/>
<xs:attribute name="pos" type="xs:string"/>
<xs:attribute name="phon" type="xs:string"/>
</xs:complexType>
</xs:element>
Unfortunately, the text part of mixed content cannot be constrained (https://docstore.mik.ua/orelly/xml/schema/ch07_05.htm), so there is no way to tell the schema that <w>
must not be empty. The same is true for the FLN schema, so no way either to control this in the source FLNs. This would have to be checked in another way (e.g. through RelaxNG, through a script).
Ok, than we should do this using a java script. A corresponding issue was created in the DGD code repository:
https://gitlab.ids-mannheim.de/pb-muendliche-korpora/dgd-code/-/issues/54
An empty w-element occurs in the transcript FOLK_E_00442_SE_01_T_01. That shouldn't be allowed. @Thomas: is that correct?