Closed kosloot closed 5 years ago
Good idea, I'd suggest calling them alias rather than label perhaps, as label is something in set definitions already (the human readable label).
I added an 'alias' mechanism to libfolia. In the 'alias' branch for now, as it imposes an ABI breach.
Still to be implemented for pynlpl (proycon/pynlpl#33)
Well.... Given this document:
<?xml version="1.0" encoding="UTF-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="doc" version="0.8" generator="libfolia-v0.4">
<metadata>
<annotations>
<division-annotation set="a-set" alias="a"/>
<division-annotation set="b-set" alias="b"/>
<token-annotation set="a-set" alias="b"/>
<token-annotation set="b-set" alias="a"/>
</annotations>
</metadata>
<text xml:id="text">
<div set="a-set">
<s id="s.1">
<w id="w.1" class="WORD" set="b">
<t>test</t>
</w>
</s>
</div>
<div set="b">
<s id="s.2">
<w id="w.2" class="WORD" set="b-set">
<t>test</t>
</w>
</s>
</div>
</text>
</FoLiA>
libfolia's folialint accepts it, but pynlpl's foliavalidator says:
Error on line 5: Invalid attribute alias for element division-annotation
Error on line 5: Element annotations has extra content: division-annotation
Error on line 3: Element metadata failed to validate content
Error on line 2: Element FoLiA failed to validate content
VALIDATION ERROR against RelaxNG schema (stage 1/2), in tests/aliases.xml
Invalid attribute alias for element division-annotation, line 5
which is right here?
Right, that is addressed and solved in #65 (to be release still), so I think we can close this one.
At the moment, having more then one annotation set in scope, leads to a lot of bloat, example:
set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/frog-mbpos-cgn"/>
and especiallyset="http://ilk.uvt.nl/folia/sets/frog-mbpos-clex"/>
are repeated a lotMaybe it is a plan to introduce short-hand labels, like
cgg-set
andcelex-set
to avoid all the bloat.Something like this:
Everywhere a set is used, you may use the label instead. When serializing the label, if provided, is preferred. Labels must be unique of course