Open stone-ts15 opened 5 years ago
What is the output if you run this command? (Note that you might need a -cp parameter or set your CLASSPATH for your own configuration)
java edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 300000 -threads 8 -maxCharLength 100000 -quiet False -serverProperties StanfordCoreNLP-chinese.properties -preload tokenize,ssplit,pos,lemma,ner,parse,coref,kbp
On Sat, Oct 26, 2019 at 3:18 AM Stone notifications@github.com wrote:
I'm using KBP for relation extraction in Chinese language. There is currently models for Chinese according to the official introduction. I modified StanfordCoreNLP-chinese.properties to add kbp annotator. When executing the client with python interface, the error below occurs:
Starting server with command: java -Xmx6G -cp %CORENLP_HOME%/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 300000 -threads 8 -maxCharLength 100000 -quiet False -serverProperties StanfordCoreNLP-chinese.properties -preload tokenize,ssplit,pos,lemma,ner,parse,coref,kbp
[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---
[main] INFO CoreNLP - setting default constituency parser
[main] INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz
[main] INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz instead
[main] INFO CoreNLP - to use shift reduce parser download English models jar from:
[main] INFO CoreNLP - http://stanfordnlp.github.io/CoreNLP/download.html
[main] INFO CoreNLP - Threads: 8
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/segmenter/chinese/ctb.gz ... done [10.3 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/chinese-distsim/chinese-distsim.tagger ... done [0.8 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/chinese.misc.distsim.crf.ser.gz ... done [5.2 sec].
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 21238 unique entries out of 21249 from edu/stanford/nlp/models/kbp/chinese/gazetteers/cn_regexner_mapping.tab, 0 TokensRegex patterns.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/srparser/chineseSR.ser.gz ... done [19.0 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
[main] INFO edu.stanford.nlp.pipeline.CorefMentionAnnotator - Using mention detector type: rule
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator kbp
[main] ERROR CoreNLP - Could not pre-load annotators in server; encountered exception:
java.util.regex.PatternSyntaxException: Unclosed character class near index 3
["鈥漖
^
at java.util.regex.Pattern.error(Unknown Source) at java.util.regex.Pattern.clazz(Unknown Source) at java.util.regex.Pattern.sequence(Unknown Source) at java.util.regex.Pattern.expr(Unknown Source) at java.util.regex.Pattern.compile(Unknown Source) at java.util.regex.Pattern.<init>(Unknown Source) at java.util.regex.Pattern.compile(Unknown Source) at edu.stanford.nlp.semgraph.semgrex.NodePattern.<init>(NodePattern.java:81) at edu.stanford.nlp.semgraph.semgrex.NodePattern.<init>(NodePattern.java:47) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.Description(SemgrexParser.java:543) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.Child(SemgrexParser.java:440) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.ModNode(SemgrexParser.java:415) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.Relation(SemgrexParser.java:329) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.RelChild(SemgrexParser.java:230) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.ModRelation(SemgrexParser.java:195) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.RelationConj(SemgrexParser.java:176) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.RelationDisj(SemgrexParser.java:123) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.SubNode(SemgrexParser.java:103) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.Root(SemgrexParser.java:34) at edu.stanford.nlp.semgraph.semgrex.SemgrexPattern.compile(SemgrexPattern.java:291) at edu.stanford.nlp.semgraph.semgrex.SemgrexBatchParser.parse(SemgrexBatchParser.java:57) at edu.stanford.nlp.semgraph.semgrex.SemgrexBatchParser.compileStream(SemgrexBatchParser.java:47) at edu.stanford.nlp.semgraph.semgrex.SemgrexBatchParser.compileStream(SemgrexBatchParser.java:39) at edu.stanford.nlp.ie.KBPSemgrexExtractor.<init>(KBPSemgrexExtractor.java:56) at edu.stanford.nlp.pipeline.KBPAnnotator.<init>(KBPAnnotator.java:115) at edu.stanford.nlp.pipeline.AnnotatorImplementations.kbp(AnnotatorImplementations.java:290) at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$25(StanfordCoreNLP.java:543) at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$30(StanfordCoreNLP.java:602) at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:126) at edu.stanford.nlp.util.Lazy.get(Lazy.java:31) at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:149) at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:251) at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:192) at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:188) at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.main(StanfordCoreNLPServer.java:1505)
I have downloaded the model for Chinese and NER result is fine. Any reason for this error?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/958?email_source=notifications&email_token=AA2AYWMRTOQDEBAL6NKD4G3QQQKPNA5CNFSM4JFMXK72YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HUQ7XCQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWPEXSBST7YY4ABLTMTQQQKPNANCNFSM4JFMXK7Q .
@AngledLuffa When I ran this command directly (I set memory=6G
), the same error occurred, like the output above.
Have you edited or in any way changed the kbp data? It loads fine out of the box for me.
Apologies for the long delay in replying.
On Sat, Oct 26, 2019 at 3:18 AM Stone notifications@github.com wrote:
I'm using KBP for relation extraction in Chinese language. There is currently models for Chinese according to the official introduction. I modified StanfordCoreNLP-chinese.properties to add kbp annotator. When executing the client with python interface, the error below occurs:
Starting server with command: java -Xmx6G -cp %CORENLP_HOME%/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 300000 -threads 8 -maxCharLength 100000 -quiet False -serverProperties StanfordCoreNLP-chinese.properties -preload tokenize,ssplit,pos,lemma,ner,parse,coref,kbp
[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---
[main] INFO CoreNLP - setting default constituency parser
[main] INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz
[main] INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz instead
[main] INFO CoreNLP - to use shift reduce parser download English models jar from:
[main] INFO CoreNLP - http://stanfordnlp.github.io/CoreNLP/download.html
[main] INFO CoreNLP - Threads: 8
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/segmenter/chinese/ctb.gz ... done [10.3 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/chinese-distsim/chinese-distsim.tagger ... done [0.8 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/chinese.misc.distsim.crf.ser.gz ... done [5.2 sec].
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 21238 unique entries out of 21249 from edu/stanford/nlp/models/kbp/chinese/gazetteers/cn_regexner_mapping.tab, 0 TokensRegex patterns.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/srparser/chineseSR.ser.gz ... done [19.0 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
[main] INFO edu.stanford.nlp.pipeline.CorefMentionAnnotator - Using mention detector type: rule
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator kbp
[main] ERROR CoreNLP - Could not pre-load annotators in server; encountered exception:
java.util.regex.PatternSyntaxException: Unclosed character class near index 3
["鈥漖
^
at java.util.regex.Pattern.error(Unknown Source) at java.util.regex.Pattern.clazz(Unknown Source) at java.util.regex.Pattern.sequence(Unknown Source) at java.util.regex.Pattern.expr(Unknown Source) at java.util.regex.Pattern.compile(Unknown Source) at java.util.regex.Pattern.<init>(Unknown Source) at java.util.regex.Pattern.compile(Unknown Source) at edu.stanford.nlp.semgraph.semgrex.NodePattern.<init>(NodePattern.java:81) at edu.stanford.nlp.semgraph.semgrex.NodePattern.<init>(NodePattern.java:47) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.Description(SemgrexParser.java:543) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.Child(SemgrexParser.java:440) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.ModNode(SemgrexParser.java:415) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.Relation(SemgrexParser.java:329) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.RelChild(SemgrexParser.java:230) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.ModRelation(SemgrexParser.java:195) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.RelationConj(SemgrexParser.java:176) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.RelationDisj(SemgrexParser.java:123) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.SubNode(SemgrexParser.java:103) at edu.stanford.nlp.semgraph.semgrex.SemgrexParser.Root(SemgrexParser.java:34) at edu.stanford.nlp.semgraph.semgrex.SemgrexPattern.compile(SemgrexPattern.java:291) at edu.stanford.nlp.semgraph.semgrex.SemgrexBatchParser.parse(SemgrexBatchParser.java:57) at edu.stanford.nlp.semgraph.semgrex.SemgrexBatchParser.compileStream(SemgrexBatchParser.java:47) at edu.stanford.nlp.semgraph.semgrex.SemgrexBatchParser.compileStream(SemgrexBatchParser.java:39) at edu.stanford.nlp.ie.KBPSemgrexExtractor.<init>(KBPSemgrexExtractor.java:56) at edu.stanford.nlp.pipeline.KBPAnnotator.<init>(KBPAnnotator.java:115) at edu.stanford.nlp.pipeline.AnnotatorImplementations.kbp(AnnotatorImplementations.java:290) at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$25(StanfordCoreNLP.java:543) at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$30(StanfordCoreNLP.java:602) at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:126) at edu.stanford.nlp.util.Lazy.get(Lazy.java:31) at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:149) at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:251) at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:192) at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:188) at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.main(StanfordCoreNLPServer.java:1505)
I have downloaded the model for Chinese and NER result is fine. Any reason for this error?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/958?email_source=notifications&email_token=AA2AYWMRTOQDEBAL6NKD4G3QQQKPNA5CNFSM4JFMXK72YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HUQ7XCQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWPEXSBST7YY4ABLTMTQQQKPNANCNFSM4JFMXK7Q .
What version of Stanford CoreNLP are you running? What Java are you using?
Thanks for @AngledLuffa's reply!
Sorry I don't know where the kbp data is. I downloaded the jar file for Chinese and copied it to %CORENLP_HOME%.
Version of Stanford CoreNLP I use is 3.9.2
. I use Java 8u231 64-bit. Does the Windows 10 operating system matter?
I honestly have no idea what could be causing this problem.
I ran this command:
java -cp * edu.stanford.nlp.pipeline.StanfordCoreNLP -properties StanfordCoreNLP-chinese.properties -annotators "tokenize,ssplit,pos,lemma,ner,parse,coref,kbp"
No problems. I then tried this:
java -cp * edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 300000 -threads 8 -maxCharLength 100000 -quiet False -serverProperties StanfordCoreNLP-chinese.properties -preload tokenize,ssplit,pos,lemma,ner,parse,coref,kbp ... snip ... [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref [main] INFO edu.stanford.nlp.pipeline.CorefMentionAnnotator - Using mention detector type: rule [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator kbp [main] INFO CoreNLP - Starting server... [main] INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000
This is on Windows 10. As it turns out, I'm using java 12.0.1. However, unless there are some encoding changes between versions, I'm not sure how that will affect things.
Can you try this with a clean download?
Without further information I think this is "cannot reproduce".
On Mon, Nov 11, 2019 at 1:21 AM Stone notifications@github.com wrote:
Thanks for @AngledLuffa https://github.com/AngledLuffa's reply! Sorry I don't know where the kbp data is. I downloaded the jar file for Chinese and copied it to %CORENLP_HOME%. Version of Stanford CoreNLP I use is 3.9.2. I use Java 8u231 64-bit. Does the Windows 10 operating system matter?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/958?email_source=notifications&email_token=AA2AYWOX2ACXUULEJYSNW63QTEP2XA5CNFSM4JFMXK72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDWFOXY#issuecomment-552359775, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWINGCDLWUNNBJSRLULQTEP2XANCNFSM4JFMXK7Q .
java.util.regex
has changed across Java versions. That being said, I seem to be able to run a basic Chinese KBP pipeline with Java 8 and Java 11.
It seems like the error is in reading in the semgrex
pattern files, and perhaps there is some issue with the Java version on Windows. I only have access to macOS and Ubuntu systems to test out on.
You might try upgrading Java and seeing if that helps...
My two successes are:
macOS, Java 11.0.1 Ubuntu, Java 1.8.0_172
I tried to run this command on Ubuntu system with Java 13, and it worked fine. I believe this is an issue with the Windows system.
I'm using KBP for relation extraction in Chinese. There is currently a model for Chinese according to the official introduction. I added kbp annotator into
StanfordCoreNLP-chinese.properties
. When I ran the client with python interface, the error below occurred:I have downloaded the model for Chinese and got an NER result. Does anybody know the reason for this error?