stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.69k stars 2.7k forks source link

CoreNLP Server NER Annotator Visualization Error #1065

Open sorinbanu opened 4 years ago

sorinbanu commented 4 years ago

I am running into some strange behavior when using CoreNLP Server, named entities. I put the tests in 4 print screens below. Stanford CoreNLP NER errors

Note that I tried with 3 different builds of JREs, all 8. Thank you upfront for any help.

AngledLuffa commented 4 years ago

It's not designed to work on one word at a time. I assume that's what you were looking for, although it's a little hard to tell since you left it up to us to interpret your screenshots.

sorinbanu commented 4 years ago

It's not designed to work on one word at a time. I assume that's what you were looking for, although it's a little hard to tell since you left it up to us to interpret your screenshots.

Thanks for your reply. Good to know. No, in my case it gives the same error when having entire phrases, which is actually what I am looking for. I just simplified the example here so it is easier to understand the words that throw the errors. Example: "If I have a this phrase parsed with England inside, it will throw the same rendering error from Test 3"

Running on Windows 10. I have a company proxy and McAfee, I don't know if this is relevant. Let me know if you need more details. Thanks upfront for any suggestion.

AngledLuffa commented 4 years ago

As I tried to indicate before, you are being too vague to understand what you consider the problem. What are you running and what do you expect to happen? I ran CoreNLP with default settings and it identifies England as a country each time.

sorinbanu commented 4 years ago

I thought the problem that I consider was clear, sorry. It is a visualization bug. I.e. One should not not expect to have the red square as a result, but instead to have a rendering like in Test 1 and Test 4. Unless you tell me differently. However, here's the print screen with a whole phrase, maybe it helps. If you would be so kind in telling me what other information you need, I would try to send it to you. Untitled

sorinbanu commented 4 years ago

Btw, in case you need a proof that with england behaves differently, here is as well a print screen with that. Thank you for the interest in helping me! image

AngledLuffa commented 4 years ago

Ok, now I understand. The red squares are the issue.

I am unable to recreate that either using the current git head or the most recent official release. How are you running the server? Is there any text output from the server which might be relevant?

sorinbanu commented 4 years ago

I really appreciate the swiftness in communication. I start the server from cmd. Please find below a copy paste from there: _d:\users\banusor\JavaProjects\stanford-corenlp-4.0.0>java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer [main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called --- [main] INFO CoreNLP - Warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz [main] INFO CoreNLP - Setting default constituency parser to PCFG parser: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz [main] INFO CoreNLP - To use shift reduce parser download English models jar from: [main] INFO CoreNLP - https://stanfordnlp.github.io/CoreNLP/download.html [main] INFO CoreNLP - Threads: 8 [main] INFO CoreNLP - Starting server... [main] INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000 [pool-1-thread-1] INFO CoreNLP - [/0:0:0:0:0:0:0:1:50552] API call w/annotators tokenize,ssplit,pos,lemma,ner With England inside, rendering error. [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos [pool-1-thread-1] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done [0.5 sec]. [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner [pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.6 sec]. [pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.4 sec]. [pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.5 sec]. [pool-1-thread-1] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1. [pool-1-thread-1] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 580705 unique entries out of 581864 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_caseless.tab, 0 TokensRegex patterns. [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 4869 unique entries out of 4869 from edu/stanford/nlp/models/kbp/english/gazetteers/regexnercased.tab, 0 TokensRegex patterns. [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 585574 unique entries from 2 files [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.NERCombinerAnnotator - Using numeric classifiers: true [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.NERCombinerAnnotator - Using SUTime: true [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.NERCombinerAnnotator - Using fine grained: true

AngledLuffa commented 4 years ago

I have run this on both Windows and Linux, using both 4.0.0 and the current git head, and no configuration comes up with this display error.

Perhaps it is a browser issue? What browser are you using? I tested this using Chrome.

sorinbanu commented 4 years ago

Same unfortunately. Windows 10. Google Chrome Version 76.0.3809.132 (Official Build) (64-bit). And just tested now with Firefox, IE11, and Edge and gives the same...