issue with the output for simplified chinese language

feng-1985 commented 5 years ago

from stanfordnlp.server import CoreNLPClient

text = '这是个最好的时代，也是一个最坏的时代！'  

properties = {
        # segment
        "tokenize.language": "zh",
        "segment.model": "edu/stanford/nlp/models/segmenter/chinese/ctb.gz",
         ...

with CoreNLPClient(properties=properties, annotators=annotators,timeout=60000, threads=5, memory='4G', be_quiet=False) as client: 
    print('---')
    print('first token of first sentence')
    token = sentence.token[0]
    print(token)
    ...

The output: first token of first sentence word: "\350\277\231" pos: "PN" value: "\350\277\231" originalText: "\350\277\231" ner: "O" lemma: "\350\277\231" beginChar: 0 endChar: 1

yuhaozhang commented 5 years ago

If you print out each value individually, the result should look right. Try the following:

with CoreNLPClient(properties=properties, annotators=annotators,timeout=60000, threads=5, memory='4G', be_quiet=False) as client: 
    print('---')
    print('first token of first sentence')
    token = sentence.token[0]
    print(token.word)
    print(token.originalText)
    print(token.lemma)
    ...

feng-1985 commented 5 years ago

Response so quickly! Thanks! You are right!

stanfordnlp / stanza

issue with the output for simplified chinese language #55