proycon / foliapy

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.
https://proycon.github.io/folia
GNU General Public License v3.0
18 stars 5 forks source link

foliavalidator somehow looses formatting #22

Open kosloot opened 4 years ago

kosloot commented 4 years ago

when running foliavalidator on examples/full-legacy.1.5.folia.xml something strange happens:

foliavalidator ../FoLiApy/folia-repo/examples/full-legacy.1.5.folia.xml --keepversion -o

The <metadata> block is outputted normally, but then:

    <submetadata xml:id="sandbox.3.metadata" type="native">
      <meta id="author">proycon</meta>
    </submetadata>
  </metadata>
  <text xml:id="WR-P-E-J-0000000001.text"><lang class="nl"/><div xml:id="WR-P-E-J-0000000001.div0.1" class="chapter" metadata="wikipedia.stemma"><head xml:id="WR-P-E-J-0000000001.head.1"><s xml:id="WR-P-E-J-0000000001.head.1.s.1"><w xml:id="WR-P-E-J-0000000001.head.1.s.1.w.1"><t>Stemma</t><pos set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/frog-mbpos-cgn" class="N(soort,ev,basis,onz,stan)"/><lemma class="stemma"/></w><ref id="footnote.1" type="note" xml:id="WR-P-E-J-0000000001.ref.1"/></s></head><p xml:id="WR-P-E-J-0000000001.p.1" class="firstparagraph"><alignment format="image/pdf" class="book" xlink:href="http://archief.nl/artikel.pdf" xlink:type="simple" xlink:role="verwijzing"/><metric class="sentenceCount" value="8"/><s xml:id="WR-P-E-J-0000000001.p.1.s.1">

So, no formatting anymore?

proycon commented 4 years ago

Strange indeed, I'll have to look into it. At least as long as the resulting XML is the same and valid there is no big rush.