stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.66k stars 2.7k forks source link

Difference b/w Model Jar Language English and English(KBP) #621

Closed MadhuKush closed 6 years ago

MadhuKush commented 6 years ago

Hi Team,

We are trying to download the Stanford NLP Model jar from the below link for English language. https://stanfordnlp.github.io/CoreNLP/download.html

But we found two jars of type English and English(KBP) image

What is the difference between these two and which one is the best. Please provide your suggestions and explanations.

Thank you in advance

gangeli commented 6 years ago

For most things, the regular English models should be sufficient. The KBP models are only for the new relation extractor (the kbp annotator), and are somewhat large, which is why they're broken out as a separate jar.

MadhuKush commented 6 years ago

Hi, Thanks for the explanation. A similar question if you dont mind could you please tell whats the difference between models jar file that come along when we download CoreNLP 3.9.0 (this models jar file size is 426MB)

image

and only models jar file that the we download from the below (this models jar file size is 1.26GB)

image

Please suggest which is good for extracting most of the entities without any issues. Thank you in advance

gangeli commented 6 years ago

The English models have more variants of each of the models than the official distribution. For example, I believe the English models include the shift-reduce constituency parser, in addition to the PCFG models. Again, for most cases, you should be ok just using the official distribution.

MadhuKush commented 6 years ago

Okay thanks alot

LifeIsStrange commented 4 years ago

@gangeli it is unclear to me whether kbp is a strict superset of the default english, it is unclear if it has on average better accuracy (orthogonally to being more expressive). Also I am coming from Spacy and I am surprised your english model doesn't has a large version a la BERT-large. btw I wonder what model you're using and whether it's still state of the art (XLnet)