lapps / org.lappsgrid.examples

LAPPS example code.
2 stars 3 forks source link

Found a bug in WhitespaceTokenize.java from tutorial #48

Closed yang1fan2 closed 7 years ago

yang1fan2 commented 8 years ago
    for (String word : words) {
        start = text.indexOf(word, start);
        if (start < 0) {
            return new Data<String>(Uri.ERROR, "Unable to match word: " + word).asJson();
        }
        int end = start + word.length();
        Annotation a = view.newAnnotation("tok" + (++id), Uri.TOKEN, start, end);
        a.addFeature(Features.Token.WORD, word);
    }

This loop is from WhitespaceTokenize.java and the bug of this code is that it forgets to add begin=end; at the end of the loop.

For example, this bug code will fail with input text "a a". It will obtain two annotations which are token a and have the same begin and end.

keighrim commented 7 years ago

Fixed via #51 and commits above. Thanks for reporting!