webanno / webanno

🆕 Work continues on INCEpTION 👉 https://github.com/inception-project/inception 👈 -- ⚠️ The official WebAnno repository has reached the end of the line. -- 🚀 To migrate, export your annotation projects from WebAnno, then import them into INCEpTION and just work on.
https://webanno.github.io/webanno
Apache License 2.0
243 stars 96 forks source link

Annotating discontinous spans for sentiment analysis #1265

Open Tyriflis opened 5 years ago

Tyriflis commented 5 years ago

We are very pleased with WebAnno, but are having some difficulties with our current annotation project regarding discontinous spans.

We are currently trying to do sentiment analysis at phrase level. So far we have defined our own fine grained layer, and defined features such as “entity” and “sentiment expression”, and then we have defined a relations layer which allows us to draw relations between the sentiment expression and its source/holder. See dummy example below.

test_same_span

However, the sentiment expressions and entities can be discontinous. We tried to make a workaround for this by defining a separate relation “same span”, but we would have liked the “same span”-relation to be indipendent from the other relations. If I have understood correctly, it is not possible to have a chain layer with several features and then have relations on those features. It also poses the problem of which of the parts is the main part of the sentiment expression.

Is there a better way to do this? Are there better ways to annotate relations between discontinous spans?

reckart commented 5 years ago

WebAnno (at least up to the current version 3.5.5) doesn't really support discontinuous spans.

If you run your texts through a dependency parser and if you get a good dependency parse out of that, then you might simply get away be annotating only the syntactic heads of the entities.

You could also create a second set of span/relation layers specifically for marking discontinuous segments.

Finally, you could try using a link feature. If your Entity layer is discontinuous, then add a feature of type Link: Entity to it. You can then use this feature to link other entities to any given entity. It is a bit tedious but works.

GillesJ commented 5 years ago

I personally encountered this issue and here are my notes on possible solutions (and which one I ended up picking in my situation):

Problem: Discontiguous token spans: Sometimes a mention of an annotation unit can be interrupted. WebAnno does not support discontiguous token span annotations directly. Here are potential solutions:

  1. Use 1 span + 1 relation layer - if a relation connects two spans, you consider them as one annotation; labels only on the first span.

❌ Impossible for our purposes because of one-time attachment of relations. Only one relation is allowed per layer. We already used that for coreference.

  1. Use 1 chain layer - every chain represents an annotation; labels only on the first span.

❌ Impossible for our purposes: chains do not allow features for attributes.

  1. Use 2 span layers and a slot feature - span layer 1 has a slot feature which accepts annotations of span layer 2; annotate the first or head part of the discontinuous annotation with layer 1, then add one slot for every additional part of the annotation.

✓ We ended up using this solution. The rationale of slot features is discussed in detail in main point 2. Make a Discontiguous span layer with no features. Make a singleton tagset "DiscontiguousTag" with "Discontiguous" as the only tag. Make a Link:Discontiguous feature with tagset: "DiscontiguousTag" on each annotation unit layer that can have discontiguous mentions.

This might not seem elegant, but it is the most userfriendly solution.

reckart commented 5 years ago

@GillesJ thanks for the feedback!

Note that adding the "Discontiguous" tag the to the Default slots field in the slot feature settings saves the annotators from having to create the slot manually when they want to link another annotation.

reckart commented 5 years ago

Here a few screenshots to illustrate the setup:

2019-04-03_15-44-24 2019-04-03_15-47-26 2019-04-03_15-49-06