Open kosloot opened 4 years ago
Agreed, type conversions should probably be checked and banned. Especially if it's also a category conversion (like inline annotation to structural as in your example)
seems solved for libfolia: I added a check on type consistency
@proycon your remark: Especially if it's also a category conversion (like inline annotation to structural as in your example)
got me thinking.
The solution that I implemented in libfolia is probably too harsh. It disallows changing 2 sentences into 1 paragraph with 2 embedded sentences, like in the example below:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="folia.xsl"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="Walter" generator="libfolia-v2.12" version="2.5.3">
<metadata type="native">
<annotations>
<token-annotation/>
<paragraph-annotation/>
<sentence-annotation/>
<text-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl"/>
<correction-annotation/>
</annotations>
</metadata>
<text xml:id="Walter.text">
<correction xml:id="Walter.correction.1">
<new>
<p xml:id="par">
<s xml:id="Walter.corr.s.1">
<t>Dit is een zin.</t>
</s>
<s xml:id="Walter.corr.s.2">
<t>Dit is nog een zin.</t>
</s>
</p>
</new>
<original auth="no">
<s xml:id="Walter.s.1">
<t>Dit is een zin.</t>
</s>
<s xml:id="Walter.s.2">
<t>Dit is nog een zin</t>
</s>
</original>
</correction>
</text>
</FoLiA>
Correcting structure should be possible. And maybe correcting the annotation type too? This will get rather complicated then.
BUT!!!. Bug alert! the following file is invalid FoLiA (as it should be)
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="folia.xsl"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="Walter" generator="libfolia-v2.12" version="2.5.3">
<metadata type="native">
<annotations>
<token-annotation/>
<paragraph-annotation/>
<sentence-annotation/>
<text-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl"/>
<correction-annotation/>
</annotations>
</metadata>
<text xml:id="Walter.text">
<row xml:id="par">
<cell>
<w>
<t>Dit is een zin.
</t>
</w>
</cell>
</row>
</text>
</FoLiA>
But we can create this abomination using a correction:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="folia.xsl"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="Walter" generator="libfolia-v2.12" version="2.5.3">
<metadata type="native">
<annotations>
<token-annotation/>
<paragraph-annotation/>
<sentence-annotation/>
<text-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl"/>
<correction-annotation/>
</annotations>
</metadata>
<text xml:id="Walter.text">
<correction xml:id="Walter.correction.1">
<new>
<row xml:id="par">
<cell>
<w>
<t>Dit is een zin.
</t>
</w>
</cell>
</row>
</new>
<original auth="no">
<s xml:id="Walter.s.1">
<t>Dit is een zin.</t>
</s>
</original>
</correction>
</text>
</FoLiA>
This is horrible!. I assume that the functions to check if a tag is appendble should look INTO the correction Lot of work en thinking is needed! @proycon please comment
Additional questions, about WHICH corrections are acceptable.
consider this very strange FoliA file:
Both foliavalidator and folialint accept this, but I assume this is abusing the correction node. My impression is, that we don't want a correction to modify the "type" of the subnode. So i suggest to add some limitation here. preferable that all arguments are of the same type. Like all \<w> or all \<t>