for (String word : words) {
start = text.indexOf(word, start);
if (start < 0) {
return new Data<String>(Uri.ERROR, "Unable to match word: " + word).asJson();
}
int end = start + word.length();
Annotation a = view.newAnnotation("tok" + (++id), Uri.TOKEN, start, end);
a.addFeature(Features.Token.WORD, word);
}
This loop is from WhitespaceTokenize.java and the bug of this code is that it forgets to add begin=end; at the end of the loop.
For example, this bug code will fail with input text "a a".
It will obtain two annotations which are token a and have the same begin and end.
This loop is from WhitespaceTokenize.java and the bug of this code is that it forgets to add
begin=end;
at the end of the loop.For example, this bug code will fail with input text "a a". It will obtain two annotations which are token
a
and have the samebegin
andend
.