proycon / foliatools

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
GNU General Public License v3.0
10 stars 4 forks source link

folivalidator handles feature nodes incorrectly when using EXPLICIT mode #15

Closed kosloot closed 3 years ago

kosloot commented 3 years ago

give the file provenance.2.0.0.folia.xml form folia-repo/examples

when using foliavalidator without EXPLICIT mode, the output file contains this fragment:

        <w xml:id="untitled.p.1.s.1.w.1" class="WORD">
          <t>De</t>
          <pos class="LID(bep,stan,rest)" processor="p1.1" confidence="0.999701" head="LID">
            <feat subset="lwtype" class="bep"/>
            <feat subset="naamval" class="stan"/>
            <feat subset="npagr" class="rest"/>
          </pos>
          <lemma class="de"/>
        </w>

when using `-x' too:

        <w xml:id="untitled.p.1.s.1.w.1" typegroup="structure" set="tokconfig-nl" class="WORD" processor="p0" textclass="current">
          <t typegroup="content" set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl" class="current">De</t>
          <pos typegroup="inline" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-cgn" class="LID(bep,stan,rest)" processor="p1.1" confidence="0.999701" textclass="current">
            <feat subset="head" class="LID"/>
            <feat subset="lwtype" class="bep"/>
            <feat subset="naamval" class="stan"/>
            <feat subset="npagr" class="rest"/>
          </pos>
          <lemma typegroup="inline" set="http://ilk.uvt.nl/folia/sets/frog-mblem-nl" class="de" processor="p1.2" textclass="current"/>
        </w>

There are 2 problems here:

  1. The typegroup attribute is missing from the <feat\> nodes
  2. The 'head' attribute is not inlined as an attribute, but kept as a \<feat\>
proycon commented 3 years ago

The 'head' attribute is not inlined as an attribute, but kept as a \<feat>

That is deliberate. In explicit mode, these attribute 'shortcuts' for features are not used, they are made explicit as features.

The typegroup attribute is missing from the <feat> nodes

I guess we consider it higher-order annotation and it could get an appropriate typegroup attribute yes.

kosloot commented 3 years ago

That is deliberate. In explicit mode, these attribute 'shortcuts' for features are not used, they are made explicit as features.

AHA a bit surprising but OK.

I guess we consider it higher-order annotation and it could get an appropriate typegroup attribute yes.

the code in FoLiApy seems to suggest typegroup="feature" there. But somehow fails to do this for features