Bad self-evaluation of OpenNLP POS recommender

reckart commented 5 years ago

Describe the bug The evaluation of the POS recommender is overly optimistic. I guess that is because it learns on partially annotated data (i.e. incomplete sentences) and just learns the majority class (GAP) and then even considers a prediction of GAP on a token without a POS tag to be a true positive.

To Reproduce Steps to reproduce the behavior:

add OpenNLP POS tagger to a project
open the recommendation sidebar to see the learning curve (requires PR #708)
annotate data

Expected behavior It can be seen that the more data is annotated, the less recommendations we get from the OpenNLP POS recommender - in particular if we annotate wildly across sentences, leaving many partially annotated sentences around.

I would expect that a GAP prediction on a token without a POS does not count towards the score of the recommender.

Screenshots 2018-12-07_13-19-55

Please complete the following information:

Version and build ID: 57666340d08ea5a8379ab66bf8b1df791f6a2473

reckart commented 5 years ago

@Horsmann I suppose this is what you were also observing in the context of the DKPro TC external recommender on the POS layer.

Horsmann commented 5 years ago

Yes, seems to be the same source problem. What I encountered was that I received mostly "GAP" predictions with the TC backend. In particular, for large documents it becomes a problem to assume a "GAP" label for tasks such as POS tagging. The majority of words will be GAP at the beginning (i.e. nothing annotated yet). This class receives during model training such a high frequency weight that you will probably never receive any other prediction than "GAP". I didn't really think about incomplete sentence annotation yet. Not sure what happens in the tc-recommender under this condition.

reckart commented 5 years ago

Well, we don't consider sentences that for the OpenNLP POS tagger that do not have any POS annotation. The sentence must have at least one POS annotation to be taken into account.

reckart commented 5 years ago

@UWinch Do you know if this is covered by https://github.com/inception-project/inception/pull/1070?

UWinch commented 5 years ago

Yes, it should be covered.

UWinch commented 5 years ago

closed with #1050

inception-project / inception

Bad self-evaluation of OpenNLP POS recommender #751