proycon / folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
http://proycon.github.io/folia/
GNU General Public License v3.0
60 stars 10 forks source link

[new annotation proposal] observations #15

Closed proycon closed 8 years ago

proycon commented 8 years ago

The observation element is a span annotation element that makes an observation pertaining to one or more word tokens. It is embedded in an observations layer. Observations offer a an external qualification on part of a text. The qualification is expressed by the class, in turn defined by a set. The precise semantics of the observation depends on the user-defined set.

The element may for example act as a more generic replacement for the errordetection element, or to encapsulate observations from teachers/proofreaders on a text, in which case it is often used with the desc element. The following example shows observations from two fictitious sets:

<s>
  <w xml:id="w1"><t>The</t></w>
  <w xml:id="w2"><t>Dalai</t></w>
  <w xml:id="w3"><t>Lama</t></w>
  <w xml:id="w4"><t>greets</t></w>
  <w xml:id="w5"><t>himm</t></w>
  <w xml:id="w6"><t>.</t></w>
 <observations>
  <observation class="typo" set="http://somewhere/errordetection.set.xml"> 
   <wref id="w5"/>
  </observation>
 </observations>
 <observations>
  <observation class="encouragement" set="http://somewhere/teacherobservations.set.xml" annotator="teacher234" annotatortype="manual">
   <wref id="w1" />
   <wref id="w2" />
   <wref id="w3" />
   <wref id="w4" />
   <wref id="w5" />
   <wref id="w6" />
   <desc>Almost a good sentence, only one mistake. Keep up the good work!</desc>
  </observation>
 </observations>
</s>

As always, further attributes can be associated with any observation using FoLiA's feature mechanism.

(proposal inspired on Revisely's solution)