howl-anderson / Chinese_models_for_SpaCy

SpaCy 中文模型 | Models for SpaCy that support Chinese
MIT License
644 stars 110 forks source link

comparison with stanza? #26

Open dcsan opened 4 years ago

dcsan commented 4 years ago

not a bug report per se

I'm wondering how spacy/chinese models compares with the stanza project? Stanza already provides chinese support with many features https://stanfordnlp.github.io/stanza/models.html

that has a chinese (simplified) model and provides dep-parser, lemma and other basic NLP features.

I'm a bit confused as it uses spacy for tokenization: https://stanfordnlp.github.io/stanza/tokenize.html#use-spacy-for-fast-tokenization-and-sentence-segmentation

You can only use spaCy to tokenize English text for now, since spaCy tokenizer does not handle multi-word token expansion for other languages.

which would imply spacy is a lower level library, and yet they seem similar.

howl-anderson commented 4 years ago

Hi @dcsan, to me why Stanza uses spacy for tokenization maybe just because SpaCy's tokenization for English is pretty good. I think Stanza and Spacy are both full-featured NLP frameworks.