inception-project / inception-external-recommender

Get annotation suggestions for the INCEpTION text annotation platform from spaCy, Sentence BERT, scikit-learn and more. Runs as a web-service compatible with the external recommender API of INCEpTION.
Apache License 2.0
40 stars 17 forks source link

SklearnMentionDetector error in BIO encoding #29

Open david-waterworth opened 3 years ago

david-waterworth commented 3 years ago

I think the line below

https://github.com/inception-project/inception-external-recommender/blob/41d894c05053720d2b37510568d11d36433c3cf9/ariadne/contrib/sklearn.py#L92

should be

if token.begin >= annotation.begin and token.end <= annotation.end:

david-waterworth commented 3 years ago

Also I think the state machine is wrong, if there are more than 2 tokens for a single annotation, the results is BIBI rather than BIII. The code will only generate an I-MENTION if the preceding token is B-MENTION. But what it should do is generate I-MENTION if the previous token is B-MENTION or I-MENTION and we're still in the same annotation.

I replaced lines 88-103 with the following - I'm not 100% sure its correct / robust though

for token in tokens:
    tag = "O"
    for annotation in annotations:
        if token.begin >= annotation.begin and token.end <= annotation.end:
            if token.begin == annotation.begin:
                tag = "B-MENTION"
            elif token.end <= annotation.end:
                tag = "I-MENTION"
            break
jcklie commented 3 years ago

I will have a look. I never really used this recommender so there certainly might be bugs in there.