stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.63k stars 2.7k forks source link

Pos tagger model from 3.9.1 models jar file not recognized #715

Closed demongolem closed 3 years ago

demongolem commented 6 years ago

When using maven, either from the command line or from within eclipse, I get the

Caused by: java.io.IOException: Unable to open "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as class path, filename or URL

error. In my pom.xml file, I have

<dependency>
    <groupId>edu.stanford.nlp</groupId>
    <artifactId>stanford-corenlp</artifactId>
    <version>3.9.1</version>
</dependency>
<dependency>
    <groupId>edu.stanford.nlp</groupId>
    <artifactId>stanford-corenlp</artifactId>
    <version>3.9.1</version>
    <classifier>models</classifier>
</dependency>

When viewing through the Maven Dependencies in eclipse, when I expand the model, I see that all the other models have the package structure edu.stanford.nlp.models.xxx and the appropriate models are underneath them. However, there is a pos-tagger folder underneath the package edu.stanford.nlp.models and then another folder underneath that for english-left3words. I don't know if this important or not, but I do know that the classic pipeline is not picking out the default pos model even with "models" as a maven dependency.

J38 commented 6 years ago

I am able to run a Stanford CoreNLP pipeline with part of speech tagging (via Maven with Stanford CoreNLP 3.9.1 (in fact that is a test I run before submitting to Maven). When I look in the stanford-corenlp-3.9.1-models.jar I do in fact see that file.

Why don't you see if you can get this Maven project running from the command line:

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

  <modelVersion>4.0.0</modelVersion>
  <groupId>edu.stanford.nlp</groupId>
  <artifactId>stanford-corenlp-test-app</artifactId>
  <packaging>jar</packaging>
  <version>1.0-SNAPSHOT</version>
  <name>stanford-corenlp-test-app</name>
  <url>http://maven.apache.org</url>
  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.5</version>
      <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>3.9.1</version>
    </dependency>
    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>3.9.1</version>
        <classifier>javadoc</classifier>
    </dependency>
    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>3.9.1</version>
        <classifier>sources</classifier>
    </dependency>
    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>3.9.1</version>
        <classifier>models</classifier>
    </dependency>
  </dependencies>
</project>

Then add this file and directory structure:

sample-english.txt english.properties src/main/java/edu/stanford/nlp/StanfordCoreNLPEnglishTestApp.java

sample-english.txt

I like pizza!

english.properties

annotators = tokenize,ssplit,pos,lemma,ner

And here is the code:

package edu.stanford.nlp;

import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;

/** app for testing if Maven distribution is working properly */

public class StanfordCoreNLPEnglishTestApp
{
    public static void main(String[] args) throws IOException, ClassNotFoundException
    {
        String[] englishArgs = new String[]{"-file", "sample-english.txt", "-outputFormat", "text", "-props", "english.properties"};
        StanfordCoreNLP.main(englishArgs);
    }
}
denismakogon commented 5 years ago

This might be the problem solver: https://interviewbubble.com/exception-in-thread-main-java-lang-runtimeexception-edu-stanford-nlp-io-runtimeioexception-error-while-loading-a-tagger-model-probably-missing-model-file/

the only change that is necessary to do: add models to your dependencies:

    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>3.9.1</version>
        <classifier>models</classifier>
    </dependency>
demongolem commented 5 years ago

This might be the problem solver: https://interviewbubble.com/exception-in-thread-main-java-lang-runtimeexception-edu-stanford-nlp-io-runtimeioexception-error-while-loading-a-tagger-model-probably-missing-model-file/

the only change that is necessary to do: add models to your dependencies:

    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>3.9.1</version>
        <classifier>models</classifier>
    </dependency>

That is what I have above in that post. Still four months later, I prefer 3.8.0 to 3.9.1 because there is something about the mvn dependencies which still does not work out of the box.

To restate, when I have the two above dependencies with 3.8.0 everything works fine. If the 2 dependencies are 3.9.1 and I run from eclipse, I get

Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file) at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:799) at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:320) at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:273) at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:85) at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:73) at edu.stanford.nlp.pipeline.AnnotatorImplementations.posTagger(AnnotatorImplementations.java:53) at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$3(StanfordCoreNLP.java:521) at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$31(StanfordCoreNLP.java:602) at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:126) at edu.stanford.nlp.util.Lazy.get(Lazy.java:31) at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:149) at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:251) at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:192) at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:188) at SentimentTrial.<init>(SentimentTrial.java:25) at SentimentTrial.main(SentimentTrial.java:123) Caused by: java.io.IOException: Unable to open "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as class path, filename or URL at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:480) at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:796) ... 15 more

demongolem commented 5 years ago

Ok @J38 , I finally have some more info for you from your June post. I entered in everything you told me. On Ubuntu 18, it works from the command line. On Windows 10, it does not work from the command line, it gives me the above stack trace. So we need not even drag Eclipse into this I think. it simply might be an OS thing? I do most of my development currently in Windows.

denismakogon commented 5 years ago

I can confirm that the following fix works with 3.9.1 on Linux Alpine.

findli commented 5 years ago

I had this issue with 3.9.1 and solve it by using 3.9.2 on windows 10 that works.

zahra1394 commented 5 years ago

hello guys. I guess when you speak about dependency, you guide about java project because I don't know any dependency in .net project? can you help me about this error in Stanford.NLP.CoreNLP.CSharp Project?

LifeIsStrange commented 3 years ago

please h e l p: how can I translate

<dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>4.0.0</version>
        <classifier>models</classifier>
    </dependency>

into gradle dependencies { implementation("edu.stanford.nlp:stanford-corenlp:4.0.0") }

AngledLuffa commented 3 years ago

Why are you necroposting on some completely unrelated bug?

I don't think anyone here does much with gradle, so ... good luck!

LifeIsStrange commented 3 years ago

@AngledLuffa I am getting the same exception as the OP. So the issue is related.

AngledLuffa commented 3 years ago

That exception can be for a myriad of different reasons, but they all ultimately have the same cause: the correct models file for the version of corenlp you are using is not in your path.

I don't know how to fix that for gradle. Perhaps someone else paying attention will know. If not, you are on your own in terms of figuring out how to associate the models file with gradle.

Since the original issue is not finding the models using maven, and your problem is not finding the models using gradle, I maintain that they are different issues. You're not even referring to the same version as in the OP.