Importance of missing annotation

Horsmann commented 5 years ago

@reckart @Rentier

I tested around with POS and NER predictions and this issue of missing annotation came up again.

POS: If I just ignore not annotated words I have no problems. If I consider them and assign them a label for not labeled yet, the amount of not annotated token will outweigh the actual annotated data, i.e. all predictions will be the dummy-label for its extremely high frequency weight.

NER: If I do not annotate words without annotation each word will receive a NER-tag. The notion of "no-tag" is missing in the training data.

Thus, I think we need a flag to indicate what to do with not annotated data. Depending on the task, ignoring them makes the predictions useless.

reckart commented 5 years ago

IMHO the classifier needs to decide what to do.

In the OpenNLP POS recommender, we mark not-annotated words using a special label during training. When predicting, we drop all the predictions of that special label. This allows us to start training even on incompletely annotated sentences. Yes, we might get a class bias, but for the moment, we are willing to try it out. We might add a configuration option (trait) specifically for the OpenNLP POS recommender to configure whether to train only on fully annotated sentences or also in incomplete sentences, maybe even allowing to configure some threshold percentage of annotated tokens at which to consider a sentence sufficiently annotated.

In the OpenNLP NER recommender, we do not do anything special - we leave the feature encoding to OpenNLP. OpenNLP NER internally uses a BIO or similar encoding to mark tokens that are (not) annotated. When predicting, it also internally reverses that encoding and provides us with (multi-token) spans and their corresponding labels (excluding any of the "out" labels generated by the BIO-like encoding).

Horsmann commented 5 years ago

IMHO the classifier needs to decide what to do. the nature of the problem decides what is more reasonable to do. You want helpful predictions very soon.

NER has a few tags and assuming that NERs repeat themself in a text after some time you will get some useful predictions. The majority being "no-class" is not a problem then. In particular, "no-class" is the right class for the most words in a text. Here you want to train on a "no-class" class. Otherwise every word will get a NER-tag prediction that fights the user and does not help him.

In PoS you have 12 between 5x tags for the most cases. Every word has one of these pre-defined classes. No class is not a valid class. Furthermore, it might take some time until words are so frequent that they beat the "no-class" bias learned from the unlabeled data.

What is right/better depends on the task, I can of course hard-code something to capture POS and NER as frequent tasks but in general, no single solutions fits all. It makes the most sense to give the recommender a hint if "no-class" is actually a valid class in a task or not.

reckart commented 5 years ago

It makes the most sense to give the recommender a hint if "no-class" is actually a valid class in a task or not.

When we should IMHO add a trait to the external recommender where the user can configure this and which gets transmitted to the TC side in some parameter section of the request. @Rentier WDYT?

jcklie commented 5 years ago

That sounds reasonable. We also need to consider whether we split up the external recommender here, e.g. add an DkProTcExternalRecommender, as not all external recommenders make use of these meta labels. We could maybe just add a factory for that. Then we dump certain traits in the request by default.

Horsmann commented 5 years ago

Sounds good, just a piece of info in the request (boolean saying no-class is valid class or not) will do then I can react on my side.

reckart commented 5 years ago

@Rentier so we will rename the current external recommender to "Generic external recommender" and then introduce a "DKPro TC external recommender" in addition?

Horsmann commented 5 years ago

I am wondering why this flag shouldn't be a default information that is served to all recommender? Like explained above, using 'no-class' incorrectly might start to fight the user by reproducing either no prediction at all or tons of wrong predictions.

I don't now what other recommenders you have at the moment but they should have the same problems?

reckart commented 5 years ago

@Horsmann e.g. for NER-like recommenders, we don't really need it.

Side question: what about the case where there is an annotation (e.g. a POS) but it has no label? (cf. https://github.com/inception-project/inception/issues/325)

Horsmann commented 5 years ago

@Horsmann e.g. for NER-like recommenders, we don't really need it.

Well, if the neural net(?) predicts NER this is true. I was using NER so far and just switched for testing to POS and noted that we are actually strongly limiting the usefulness of the recommender by neglecting the meaning of no-label

Side question: what about the case where there is an annotation (e.g. a POS) but it has no label? (cf. inception-project/inception#325)

I am not overwriting existing user annotations. I skip cases without value otherwise I get null values in the backend (so, the user created an annotation and hasn't assigned a value yet but the training is triggered then the reference value is null).

Could you prepare a dummy request that I could use for testing? Then I would have a look.

jcklie commented 5 years ago

I created an issue for per external recommender configuration in INCEpTION.

jcklie commented 5 years ago

How would you call that flag? I had some ideas, but I do not like them that much:

useFallbackLabel
useDefaultLabel
padIfNoLabel
useDummyLabel
needsDummyLabel

We could also just specify what kind of recommender TC should use for us in a field, e.g. add a classifierType to the request. Tbh, I like that more.

jcklie commented 5 years ago

Btw, can TC do sentence level recommendations? If yes, then I would open an issue here so I do not forget that.

Horsmann commented 5 years ago

@Rentier Yes, this should work in principle but will probably require more information in the request :) - at the moment it is silently assumed that the classification target is a word/token; this should be communicated in the request accordingly. I will have to setup things differently in the backend for sentence-classification. We also can do document or multi-sentence...

How about isNoLabelValidValue this would be yes for NER and false for POS.

jcklie commented 5 years ago

We already send you the granularity (e.g. token, span, sentence). Can it be said that in general for token recommendation that no label is invalid?

Horsmann commented 5 years ago

Ah, ok, then I would just need a sample request and I can add sentence prediction :)

NER is Token level but no Label is for the most words the right/valid answer? So no, depending on the task it might be valid.

jcklie commented 5 years ago

NER for us is span, POS token.

jcklie commented 5 years ago

For now, I think we can use the granularity in the metadata to fix this bug. The values in the metadata->anchoringMode are:

characters
singleToken
tokens
sentences

We support with your DkPro TC recommender 2 (isNoLabelValidValue = false) and 3, (isNoLabelValidValue = true)

@reckart Wdyt?

Horsmann commented 5 years ago

Sounds good. Token is then no label is invalid und span ist valid. I will have to refactor some code. At the moment, I think two "nouns" would be merged because the two notions of token/span are a bit intermixed.

reckart commented 5 years ago

IMHO the following makes sense:

characters - we have no character-level recommender so far
singleToken - We assume a POS/Lemma-like sequence classification task. Either use all sentences for training which contain at least 1 target annotation - or only use such sentences where there is a target annotation for every token. Previously, the OpenNLP POS recommender considered only fully annotated sentences, but we changed this now to consider also partially annotated sentences. For partially annotated sentences, we generate an artificial "GAP" label if a target annotation is missing for a token.
tokens - We assume a BIO(or similar)-encoded sequence classification task. Use all sentences for training. One might consider using only sentences that contain at least one target annotation.
sentences - We assume a multi-label classification task, not a binary classification task. So ignore sentences that do not carry a label; only train on sentences with labels.

WDYT?

Horsmann commented 5 years ago

Sounds good but why is sentence multi-label? Isn't that yet again another sub-case of sentence classification? e.g. sentiment would be a normal single-label tasks that could be done on sentence level? (pos|neg|neutral) as labels.

reckart commented 5 years ago

sorry, I'm always mixing up multi-label and multi-class. Sentence classification would be multi-class, not binary.

reckart commented 5 years ago

In fact, the recommender framework in INCEpTION currently only supports single-label classifiers. If a layer has multiple features (like person, number, etc. on a morphology layers), then separate recommenders need to be configured for each of them and each feature is predicted individually. Supporting multi-label classifiers could be a future extension.

inception-project / external-recommender-dkpro-tc

Importance of missing annotation #12