Closed WXCMYDEARMELAN closed 1 month ago
Not sure how to help here considering I don't know what you downloaded or ran
Please send text instead of images
Not sure how to help here considering I don't know what you downloaded or ran Please send text instead of images
please check code: Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit"); props.setProperty("tokenize.language", "zh"); props.setProperty("segment.model", "edu/stanford/nlp/models/segmenter/chinese/ctb.gz"); props.setProperty("segment.dictionary", "edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
log: 11:39:53.747 [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize 11:40:02.958 [main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/segmenter/chinese/ctb.gz ... done [9.2 sec]. 11:40:02.988 [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit 11:40:03.202 [main] INFO edu.stanford.nlp.wordseg.ChineseDictionary - Loading Chinese dictionaries from 1 file: 11:40:03.202 [main] INFO edu.stanford.nlp.wordseg.ChineseDictionary - edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz 11:40:03.548 [main] INFO edu.stanford.nlp.wordseg.ChineseDictionary - Done. Unique words in ChineseDictionary is: 423200. 11:40:03.548 [main] INFO edu.stanford.nlp.wordseg.ChineseDictionary - Loading Chinese dictionaries from 1 file: 11:40:03.548 [main] INFO edu.stanford.nlp.wordseg.ChineseDictionary - /home/john/extern_data/corenlp-segmenter/dict-chris6.ser.gz 11:40:03.551 [main] ERROR edu.stanford.nlp.wordseg.ChineseDictionary - java.io.IOException: Unable to open "/home/john/extern_data/corenlp-segmenter/dict-chris6.ser.gz" as class path, filename or URL
Exception: Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Unable to open "/home/john/extern_data/corenlp-segmenter/dict-chris6.ser.gz" as class path, filename or URL
maven:
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>4.2.2</version>
<classifier>models-chinese</classifier>
<exclusions>
<exclusion>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>4.2.2</version>
<exclusions>
<exclusion>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
</exclusion>
</exclusions>
</dependency>
There's a much newer version available. Can I recommend upgrading?
https://mvnrepository.com/artifact/edu.stanford.nlp/stanford-corenlp
There's a much newer version available. Can I recommend upgrading? https://mvnrepository.com/artifact/edu.stanford.nlp/stanford-corenlp
yeah I upgraded the jar to 4.5.5, but the result is the same
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>4.5.5</version>
<classifier>models-chinese</classifier>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>4.5.5</version>
<exclusions>
<exclusion>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
</exclusion>
</exclusions>
</dependency>
14:29:29.439 [main] DEBUG edu.stanford.nlp.pipeline.StanfordCoreNLP - ssplit is now included as part of the tokenize annotator by default 14:29:29.442 [main] DEBUG edu.stanford.nlp.pipeline.StanfordCoreNLP - Updating annotators from tokenize, ssplit to tokenize 14:29:29.456 [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize 14:29:37.666 [main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/segmenter/chinese/ctb.gz ... done [8.2 sec]. 14:29:37.704 [main] INFO edu.stanford.nlp.wordseg.ChineseDictionary - Loading Chinese dictionaries from 1 file: 14:29:37.704 [main] INFO edu.stanford.nlp.wordseg.ChineseDictionary - edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz 14:29:37.909 [main] INFO edu.stanford.nlp.wordseg.ChineseDictionary - Done. Unique words in ChineseDictionary is: 423200. 14:29:37.909 [main] INFO edu.stanford.nlp.wordseg.ChineseDictionary - Loading Chinese dictionaries from 1 file: 14:29:37.909 [main] INFO edu.stanford.nlp.wordseg.ChineseDictionary - /home/john/extern_data/corenlp-segmenter/dict-chris6.ser.gz 14:29:37.911 [main] ERROR edu.stanford.nlp.wordseg.ChineseDictionary - java.io.IOException: Unable to open "/home/john/extern_data/corenlp-segmenter/dict-chris6.ser.gz" as class path, filename or URL edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:501) edu.stanford.nlp.io.IOUtils.readObjectFromURLOrClasspathOrFileSystem(IOUtils.java:309) edu.stanford.nlp.wordseg.ChineseDictionary.loadDictionary(ChineseDictionary.java:69)
Okay, I understand the problem. The model was built with some various defaults, including paths to /home/john
. The Chinese Pipeline uses some various flags to set those to the new locations of those files in the jar resources we distribute. You can see the paths for the segmenter model in the StanfordCoreNLP-chinese.properties
file, copied here for your convenience:
tokenize.language = zh
segment.model = edu/stanford/nlp/models/segmenter/chinese/ctb.gz
segment.sighanCorporaDict = edu/stanford/nlp/models/segmenter/chinese
segment.serDictionary = edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz
segment.sighanPostProcessing = true
ssplit.boundaryTokenRegex = [.。]|[!?!?]+
If you're creating a Pipeline by hand using Properties, instead of reusing that properties file, you'll want to set those properties as well as the ones you've already set above.
ps that should also work for the older 4.2.2, assuming there's a reason you wanted to use that version, but I do recommend updating. Every once in a while we fix a bug relevant to one of those models
Okay, I understand the problem. The model was built with some various defaults, including paths to
/home/john
. The Chinese Pipeline uses some various flags to set those to the new locations of those files in the jar resources we distribute. You can see the paths for the segmenter model in theStanfordCoreNLP-chinese.properties
file, copied here for your convenience:tokenize.language = zh segment.model = edu/stanford/nlp/models/segmenter/chinese/ctb.gz segment.sighanCorporaDict = edu/stanford/nlp/models/segmenter/chinese segment.serDictionary = edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz segment.sighanPostProcessing = true ssplit.boundaryTokenRegex = [.。]|[!?!?]+
If you're creating a Pipeline by hand using Properties, instead of reusing that properties file, you'll want to set those properties as well as the ones you've already set above.
It's useful, thank you very much
Exception:
I never specified the path in the code: /home/john/extern_data/corenlp-segmenter/dict-chris6.ser.gz.