proycon / folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
http://proycon.github.io/folia/
GNU General Public License v3.0
60 stars 10 forks source link

Apply space attribute more generically to multiple structure elements #61

Closed proycon closed 4 years ago

proycon commented 5 years ago

As found by @kosloot; the following text is currently inconsistent because TEXTDELIMITER for part is empty and explicit leading/trailing spaces will be normalised away. We need a mechanism to override the default TEXTDELIMITER in some instances. Perhaps by generically allowing the "space" attribute (like on <w>).

<FoLiA xmlns="http://ilk.uvt.nl/folia" xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="test" version="1.5" >
  <metadata>
      <annotations>
      </annotations>
  </metadata>
  <text xml:id="t1">
    <p xml:id="p1">
      <t>Een test</t>
      <part xml:id="part1">
        <t>Een</t>
      </part>
      <part xml:id="part2">
        <t>test</t>
      </part>
    </p>
  </text>
</FoLiA>
proycon commented 5 years ago

Ok, this was not entirely correct yet. It is now fixed.