vncorenlp / VnCoreNLP

A Vietnamese natural language processing toolkit (NAACL 2018)
Other
587 stars 145 forks source link

VnCoreNLP-1.0.jar don't use unicode by default #4

Closed luanvuhlu closed 6 years ago

luanvuhlu commented 6 years ago

I run command line java -Xmx2g -jar VnCoreNLP-1.0.jar -fin input.txt -fout output.txt -annotators wseg,pos on window os and output.txt show issue: it show Nguyá» instead of Nguyễn. Seem like issue comes from both reading and writing file. Window don't like Linux or MacOs, it doesn't use Java Utf8 by default, So your code should read and write file with unicode by default or add option add option: -Dfile.encoding=UTF8

vncorenlp commented 6 years ago

Thanks for your comment. We just fixed it. Please inform use if there is any problem. Thanks (smile)