proycon / folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
http://proycon.github.io/folia/
GNU General Public License v3.0
60 stars 10 forks source link

correcting a correction. What is wrong here? #97

Closed kosloot closed 3 years ago

kosloot commented 3 years ago

I attempted (with Ucto) to correct a correction. Yielding this FoLiA, which is rejected by foliavalidator I wonder if the document is wrong (and why then?) or the validator ? It says:

< foliavalidator bug.xml
VALIDATION ERROR on full parse by library (stage 2/3), in bug.xml
ParseError: FoLiA exception in handling of <new> @ line 34 (in parent <correction> @ parent line 33) : [DeclarationError] Encountered an instance without proper declaration: New <new>!
<?xml version="1.0" encoding="UTF-8"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="bug" generator="libfolia-v2.4" version="2.5">
  <metadata type="native">
    <annotations>
      <text-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl"/>
      <sentence-annotation/>
      <paragraph-annotation/>
      <correction-annotation set="folia-correct">
        <annotator processor="FoLiA-correct.1"/>
      </correction-annotation>
      <token-annotation alias="tokconfig-nld" set="https://raw.githubusercontent.com/LanguageMachines/uctodata/master/setdefinitions/tokconfig-nld.foliaset.ttl">
        <annotator processor="ucto.1"/>
      </token-annotation>
      <correction-annotation set="tokconfig-nld">
        <annotator processor="ucto.1"/>
      </correction-annotation>
    </annotations>
    <provenance>
      <processor xml:id="FoLiA-correct.1" begindatetime="2020-01-06T12:08:30" command="FoLiA-correct --punct=punct.punct --unk=unk.unk --rank=rank.ranked --clear --inputclass=Test --ngram=3 -v  -v " folia_version="2.2.1" host="bonus" name="FoLiA-correct" user="sloot" version="0.14">
        <processor xml:id="FoLiA-correct.1.generator" folia_version="2.2.1" name="libfolia" type="generator" version="2.4"/>
      </processor>
      <processor xml:id="ucto.1" begindatetime="2020-01-15T11:42:45" command="ucto -L nld --correctwords folia-correct-corrected.xml folia-corrected-3.xml" folia_version="2.2.1" host="bonus" name="ucto" user="sloot" version="0.21">
        <processor xml:id="ucto.1.generator" folia_version="2.2.1" name="libfolia" type="generator" version="2.4"/>
        <processor xml:id="uctodata.1" name="uctodata" type="datasource" version="0.8">
          <processor xml:id="uctodata.1.1" name="tokconfig-nld" type="datasource" version="0.2"/>
        </processor>
      </processor>
    </provenance>
  </metadata>
  <text xml:id="text">
    <p xml:id="p1">
      <s xml:id="s1">
        <correction xml:id="s1.correction.1" set="tokconfig-nld">
          <new>
            <w xml:id="w3.cor.tokenized.1" class="WORD" set="tokconfig-nld" space="no">
              <t>één</t>
            </w>
            <w xml:id="w3.cor.tokenized.2" class="PUNCTUATION" set="tokconfig-nld">
              <t>.</t>
            </w>
          </new>
          <original auth="no">
            <correction xml:id="cor.1" set="folia-correct">
              <new>
                <w xml:id="w3.cor">
                  <t>één.</t>
                </w>
              </new>
              <original auth="no">
                <w xml:id="w3">
                  <t>een.</t>
                </w>
              </original>
            </correction>
          </original>
        </correction>
      </s>
    </p>
  </text>
</FoLiA>
proycon commented 3 years ago

This seems a duplicate of proycon/foliatools#38, I tested the solution for that one and that solves this instance too.