proycon / folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
http://proycon.github.io/folia/
GNU General Public License v3.0
60 stars 10 forks source link

[new annotation proposal] Sentiment Analysis #16

Closed proycon closed 8 years ago

proycon commented 8 years ago

FoLiA currently has the token annotation subjectivity for limited sentiment analysis or other subjectivity annotation, it is used by the VU-DNC corpus for instance. This, however, is not sufficient for more complex expressions of sentiment. A strong span annotation element is needed. The following proposal is inspired on NAF's opinion layer:

<s>
 <w xml:id="w1"><t>He</t></w>
 <w xml:id="w2"><t>is</t></w>
 <w xml:id="w3"><t>happy</t></w>
 <w xml:id="w4"><t>to</t></w>
 <w xml:id="w5"><t>see</t></w>
 <w xml:id="w6"><t>him</t></w>
 <w xml:id="w7"><t>.</t></w>
 <sentiments>
  <sentiment class="emotion.joy" polarity="positive" strength="moderate">
    <source>
      <wref id="w1" t="he" />
    </source>
    <target>
      <wref id="w6" t="him" />
    </target>
    <hd>
      <wref id="w3" t="happy" />
    </hd>
  </sentiment>
 </sentiments>
</s>

This predefines the following feature subsets, whether they are actually used and the class values they take are defined by the set.

The following span role elements are introduced and used (will be reused in another upcoming proposal as well):