Closed abhisheknovoic closed 4 years ago
Are you using the stanza models or the corenlp models to get this result?
Both stanza 1.0.1 and corenlp 4.0.0 annotate it with 这 是 split instead of together.
@abhisheknovoic by the way the website you linked is for Classical Chinese, not modern Chinese.
It does seem like demonstrative pronouns are not annotated yet in the Chinese data our models are trained on (even though, as @AngledLuffa pointed out, the segmentation should be 这 是
with our latest models). In the meantime you should be able to use XPOS (PRD
) as a feature for that, we (I) will try to fix the data upstream. Thanks for reporting!
Hi @qipeng , sorry I wanted to clarify this for my understanding. Firstly, I am using Stanza for doing this. Does this mean that Stanza is not annotating it yet and it is on the plan for future? Sorry, I am not clear how I should use @AngledLuffa 's response above.
Thanks much for your time! This is super helpful!
@abhisheknovoic If you're using stanza.Pipeline
for this, then the properties (for stanza.server.CoreNLPClient
) is actually irrelevant. And neither @AngledLuffa nor I could reproduce the segmentation you reported in the question with the latest CoreNLP or stanza models, so it's likely you're using something old (that's what @AngledLuffa was trying to point out, I think).
This is an upstream issue in the UD data we used to train these models, which I happen to maintain for Chinese. We'll fix the data and retrain the models in the future.
@abhisheknovoic also for future reference: a minimal code sample would greatly help us diagnose the problem in future issue reports! 😆
My response above is the discovery that somewhere in this process, stanza is not being used as we expect. Therefore, you're not getting the result we expect on the text you provided. (For one thing, the corenlp properties are irrelevant if you are just using stanza models to process the data.) As is quickly becoming tradition, there isn't enough information available to recreate your result.
If I do this:
stanza.download('zh')
nlp = stanza.Pipeline('zh')
doc = nlp('这是我妈妈的戒指')
for word in doc.sentences[0].words:
print(word.text)
I get the following:
这
是
我
妈妈
的
戒指
so the segmentation resulting from running stanza does not agree with the segmentation you have reported above.
@AngledLuffa @qipeng ah, I got it.
Yes I am using Stanza but the language zh-hant and not zh. Sorry I don't know Chinese and I wasn't sure if I should use zh or zh-hant.
Ok, point noted. Hopefully in the next version or later we will get to see the demonstrative. :)
Thanks to both you. You are amazing !
Hello,
Firstly, thanks for the amazing support for other languages in Stanza. I am using Stanza to parse Chinese sentences.
The sentence is
which in English means,
I have a properties file which looks as follows:
I get the following parse tree entities
As we see here, we don't get any demonstratives. Do I need to make some changes to the properties file to be able to see the demonstratives? I expect the word "This" to be a demonstrative pronoun.
I see that PronType:Dem is supported in the UD dependencies for Chinese. I may be looking at the wrong link - please correct me if I am wrong here.
Again, thanks for your time and if I need to provide more information, please do let me know.