proycon / folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
http://proycon.github.io/folia/
GNU General Public License v3.0
60 stars 10 forks source link

Representing negation annotation with cues and scopes in FoLiA #86

Closed proycon closed 3 years ago

proycon commented 3 years ago

None of the existing annotation types in FoLiA seems to be a good fit for the annotation of negation cues and scopes. Forcing it into an existing category would be contrived and not entirely semantically correct, so I propose we extend FoLiA with explicit support for this. I'll have to consider whether there are some more generic use cases for this so we can find the right balance between introducing new specific elements and having them generic enough to cover multiple scenarios.

(proposal to follow shortly)

proycon commented 3 years ago

I propose "cue/scope annotation" (aka cue annotation), a span annotation type, with similar structure as dependency relations:

Given the sentence: [I did] {not} [know who you were], but I was determined to find out. where [] denotes scope and {} denotes the cue. We could annotate the negation cue and its scope as follows (pseudo-FoLiA excerpt):

  <cue class="negation">
      <hd>
          <wref id="w3" t="not" />
      </hd>
      <scope>
          <wref id="w1" t="I" />
          <wref id="w2" t="did" />
          <wref id="w4" t="know" />
          <wref id="w5" t="who" />
          <wref id="w6" t="you" />
'.        <wref id="w7" t="were" />
      </scope>
  </cue>

The class is from whatever user-determined vocabulary, so this could be used for negation annotation but also related things like speculation or similar. It also allows using a more fine-grained vocabulary for cue/scope relations or the cues as such.

The head (hd) of the cue is the actual cue word. Both the cue as the scope may be multiword and/or discontinuous. There should be only one head and one scope role. Multiple cues in the same sentence are simply solved with multiple cue elements.

Some studies also speak of "focus" in the context of negation, I'll have to investigate whether to add an extra optional span role for that.

kosloot commented 3 years ago

couldn't this not just be resolved using the existing dependency structure?

  <dependency class="negation">
      <hd>
          <wref id="w3" t="not" />
      </hd>
      <dep>
          <wref id="w1" t="I" />
          <wref id="w2" t="did" />
          <wref id="w4" t="know" />
          <wref id="w5" t="who" />
          <wref id="w6" t="you" />
'.        <wref id="w7" t="were" />
      </dep>
  </dependency>

Maybe adding a "label" like scope to the \<dep> for clarity.

proycon commented 3 years ago

Well, that's basically what I said in the initial post:

Forcing it into an existing category would be contrived and not entirely semantically correct, so I propose we extend FoLiA with explicit support for this.

FoLiA is rather specific with annotation types, though using generic underlying mechanism. Even though it's comparable to dependency relations functionally, talking about cues and scopes in terms of a dependency relation with heads and dependents will probably feel odd. (correct me if I'm wrong, I'm not a linguist). Therefore I think this merits a new type.

proycon commented 3 years ago

I have an alternative proposal as well (which I'm starting to prefer over the first one):

<modality class="negation">
      <cue>
          <wref id="w3" t="not" />
      </cue>
      <scope>
          <wref id="w1" t="I" />
          <wref id="w2" t="did" />
          <wref id="w4" t="know" />
          <wref id="w5" t="who" />
          <wref id="w6" t="you" />
'.        <wref id="w7" t="were" />
      </scope>
</modality>

The questions is if this would more more or less generic and can be called 'Modality annotation', which sounds better than Cue/scope annotation. But; is that term sufficient to also cover scenarios where we annotate factuality, certainty and truthfulness?

Irishx commented 3 years ago

Negation cannot be considered as a syntactic dependency marker: it is well possible that the scope of the negation crosses to a previous sentence. Furthermore, a negation cue can be a part of a word: "[on]verdraagbaar".

For full modal annotation, cue and scope are not sufficient. In modality (and sentiment too) there are other elements to be annotated: the holder of the sentiment/certainty/belief and perhaps the speaker of the utterance. And also assign a positive or negative polarity.

We proposed a annotation scheme here for modality: https://repository.ubn.ru.nl/bitstream/handle/2066/145192/145192.pdf

and some examples here: https://www.researchgate.net/profile/Iris_Hendrickx/publication/258508153_Modality_in_Text_a_Proposal_for_Corpus_Annotation/links/5492c0d60cf2302e1d073f5d.pdf

and Ill actually work on this more in the next year, so good timing :-)

proycon commented 3 years ago

Thanks for the feedback! I see you're quite deep in the subject so I might want to pick your brain a bit further :)

Furthermore, a negation cue can be a part of a word: "[on]verdraagbaar".

Good point, that's already accounted for as we can point back to morphemes instead of words.

For full modal annotation, cue and scope are not sufficient. In modality (and sentiment too) there are other elements to be annotated: the holder of the sentiment/certainty/belief and perhaps the speaker of the utterance. And also assign a positive or negative polarity.

I see some overlap here with sentiment analysis: https://folia.readthedocs.io/en/latest/sentiment_annotation.html , we have <source> and <target> roles there to express the source/holder of the sentiment, and target/recipient of the sentiment, respectively. I'll study the papers you linked.

We need to find a balance between what we define in FoLiA (like cue, scope, source, target), and what is left up to the user-provided vocabulary (the FoLiA set definition).

To be continued...