I run command line java -Xmx2g -jar VnCoreNLP-1.0.jar -fin input.txt -fout output.txt -annotators wseg,pos on window os and output.txt show issue: it show Nguyá» instead of Nguyễn. Seem like issue comes from both reading and writing file.
Window don't like Linux or MacOs, it doesn't use Java Utf8 by default, So your code should read and write file with unicode by default or add option add option:
-Dfile.encoding=UTF8
I run command line
java -Xmx2g -jar VnCoreNLP-1.0.jar -fin input.txt -fout output.txt -annotators wseg,pos
on window os and output.txt show issue: it show Nguyá» instead of Nguyễn. Seem like issue comes from both reading and writing file. Window don't like Linux or MacOs, it doesn't use Java Utf8 by default, So your code should read and write file with unicode by default or add option add option:-Dfile.encoding=UTF8