Closed tdt98 closed 3 years ago
Hi, It's not an abnormality, i.e. it's not an issue/problem to be fixed. We use a normalization step to handle various outputs of different typing methods on different OSs. See https://github.com/vncorenlp/VnCoreNLP/blob/687822d3b40dc9002d7205b9067c6817fa40ed34/src/main/java/vn/corenlp/wordsegmenter/Utils.java#L105 Regarding your question, you can thus simply write a short post-processing script to reverse that normalization step on the VnCoreNLP's output.
Chuẩn hóa cách gõ dấu câu về kiểu gõ cũ: https://gist.github.com/nguyenvanhieuvn/72ccf3ddf7d179b281fdae6c0b84942b
Dear @datquocnguyen, thank you for sharing your great work. I just obsereved an abnormality in VNCoreNLP with word segmentation. With the input "Hòa", I received "Hoà", that means the "`" tone mask is shifted one character. Could you please fix this problem or provide solution in the future. Thank you in advance.