Closed villahp closed 6 years ago
Hi, I do not really understand what your problem is ? A "token" here is a syllable, i.e. not a "word". So it is likely that the number of tokens/syllables is different to the number of words in a sentence.
How can I get list of segmented words? Thanks in advance
@villahp The getWords function from Annotation will return the list of segmented words. The sample code is as follows
import vn.pipeline.*;
import java.io.*;
public class VnCoreNLPExample {
public static void main(String[] args) throws IOException {
String str = "Bà Ngọc Lan đang đến thăm Hà Nội.";
String[] annotators = {"wseg"};
VnCoreNLP pipeline = new VnCoreNLP(annotators);
Annotation annotation = new Annotation(str);
pipeline.annotate(annotation);
for (Word word : annotation.getWords())
System.out.println(word.getForm());
}
}
If the string has a number at the end of its. The size of list in getTokens() response different with the size of getWords().