stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.68k stars 2.7k forks source link

Disable NER Fine Grained in CoreNLP Server #781

Closed bjascob closed 6 years ago

bjascob commented 6 years ago

I'm accessing the CoreNLPServer from a python script which works excellent, however I believe the new "fine-grained" NER is significantly slowing down parsing in v3.9.2. I've seen references that say it can be turned off with the java code...

ner.applyFineGrained=false
ner.buildEntityMentions=false 

How can I pass these commands to the startup script for the server? I'm currently using a startup command something like...

cmd = 'java -mx4g -cp %s/* edu.stanford.nlp.pipeline. StanfordCoreNLPServer ' % (config.core_nlp)
cmd += '--port %d --preload tokenize,ssplit,pos,lemma,ner,parse ' % (config.port)

Alternately I could pass a param in with requests, but again, I'm not sure of the format. Currently I'm using...

reqdict = {'annotators': 'tokenize, ssplit, pos, lemma, ner', 'outputFormat': 'json'}
requests.post(self.server_url, params={'properties':reqdict}, data=text.encode(), headers{'Connection': 'close'})

Can fine-grained be disabled through the startup script or the client's requests? Any hints on how to do this would be appreciated.

bjascob commented 6 years ago

I managed to get some help from another source. The python requests can be modified by adding to the "properties" something like..

self.reqdict = {ner.applyFineGrained': 'false', 'ner.buildEntityMentions': 'false', 
'annotators': 'tokenize, ssplit, pos, lemma, ner', 'outputFormat': 'json'}

and then use the post call as shoiwn above.