stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.62k stars 2.7k forks source link

3.4 Release missing classes for SR Parser #30

Closed Necrolis closed 10 years ago

Necrolis commented 10 years ago

When setting parse.model=edu/stanford/nlp/models/srparser/englishSR.ser.gz and using the SR models from the site, there is a java.lang.ClassNotFoundException being thrown for edu.stanford.nlp.parser.shiftreduce.BasicFeatureFactory.

Upon inspection it looks like the class files for BasicFeatureFactory & DistsimFeatureFactory were not included in the build/jar; this renders the SR parser unusable from 3.4 (which is a bit of a pain as we use the .Net bindings).

AngledLuffa commented 10 years ago

Yikes! I forgot that the features were loaded by reflection. This should now be fixed.

Thanks for reporting.

On Tue, Jun 24, 2014 at 10:52 AM, Necrolis notifications@github.com wrote:

When setting parse.model=edu/stanford/nlp/models/srparser/englishSR.ser.gz and using the SR models from the site, there is a java.lang.ClassNotFoundException being thrown for edu.stanford.nlp.parser.shiftreduce.BasicFeatureFactory.

Upon inspection it looks like the class files for BasicFeatureFactory & DistsimFeatureFactory were not included in the build/jar; this renders the SR parser unusable from 3.4 (which is a bit of a pain as we use the .Net bindings).

— Reply to this email directly or view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/30.

Necrolis commented 10 years ago

After getting a bit of time to test the new SR parser, its seems its unusable because the setConstraints method in ShiftReduceParserQuery.java is incomplete (it just throws a UnsupportedOperationException("Unable to set constraints on the shift reduce parser (yet)") exception), but the docs for the SR parser seem to indicate that it works/should be usable just by setting the parse model.

Is the SR parser meant to be usable from the pipeline at all? or is it only available for training at the moment?

AngledLuffa commented 10 years ago

There are several features currently missing. Constraints is one of them, no caseless model is another. It is still an experimental model because of this. Those features are likely to be added by the end of the summer.

If you don't need constraints, it can be used as a drop in replacement. On Jun 27, 2014 9:55 AM, "Necrolis" notifications@github.com wrote:

After getting a bit of time to test the new SR parser, its seems its unusable because the setConstraints method in ShiftReduceParserQuery.java is incomplete (it just throws a UnsupportedOperationException("Unable to set constraints on the shift reduce parser (yet)") exception), but the docs for the SR parser seem to indicate that it works/should be usable just by setting the parse model.

Is the SR parser meant to be usable from the pipeline at all? or is it only available for training at the moment?

— Reply to this email directly or view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/30#issuecomment-47346614.

Necrolis commented 10 years ago

Those features are likely to be added by the end of the summer.

Good to know, we'll be patiently waiting to test and break things :)

If you don't need constraints, it can be used as a drop in replacement.

Could you expand on how to get it running without constraints? cause I haven't been able to get any of the parsing examples to work.

AngledLuffa commented 10 years ago

That's unfortunate. It should just work with -parse.model (fullpath)/englishSR.ser.gz What happens when you try that? On Jun 27, 2014 10:04 AM, "Necrolis" notifications@github.com wrote:

Those features are likely to be added by the end of the summer.

Good to know, we'll be patiently waiting to test and break things :)

If you don't need constraints, it can be used as a drop in replacement.

Could you expand on how to get it running without constraints? cause I haven't been able to get any of the parsing examples to work.

— Reply to this email directly or view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/30#issuecomment-47347755.

Necrolis commented 10 years ago

Running corenlp.sh -file input.txt -props config.props where input.txt is the default from the repo and config.props contains only parse.model=edu/stanford/nlp/models/srparser/englishSR.ser.gz yields the aforementioned exception (and trying to use the minimal set of annotators leads to the same exception). I have checked that all the models do indeed load correctly.

I'm not sure if this could cause any problems but I'm running the corenlp.sh on Windows 7 through mysysgit.

AngledLuffa commented 10 years ago

Did you download the shift-reduce models jar and add it to your classpath?

It's a separate jar as it includes models for all of the languages, and put together, they become rather large...

John

On Fri, Jun 27, 2014 at 7:32 AM, Necrolis notifications@github.com wrote:

Running corenlp.sh -file input.txt -props config.props where input.txt is the default from the repo and config.props contains only parse.model=edu/stanford/nlp/models/srparser/englishSR.ser.gz yields the aforementioned exception (and trying to use the minimal set of annotators leads to the same exception).

I'm not sure if this could cause any problems but I'm running the corenlp.sh on Windows 7 through mysysgit.

— Reply to this email directly or view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/30#issuecomment-47351881.

Necrolis commented 10 years ago

Yes, I have the SR models added to the classpath, and they are also loaded correctly (I've tried both the beam and the normal SR models).

This is the full output of a run:

C:\***\stanford-corenlp-full-2014-06-16>corenlp.sh -file input.txt -props config.props
Welcome to Git (version 1.8.3-preview20130601)

Run 'git help git' to display the help index.
Run 'git help <command>' to display help for specific commands.
C:\***\stanford-corenlp-full-2014-06-16\corenlp.sh: line 13: readlink: command not found
java -mx3g -cp "./*" edu.stanford.nlp.pipeline.StanfordCoreNLP -file input.txt -props config.props
Searching for resource: StanfordCoreNLP.properties
Searching for resource: edu/stanford/nlp/pipeline/StanfordCoreNLP.properties
Adding annotator tokenize
Adding annotator ssplit

Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.8 sec].
Adding annotator lemma
Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [4.3 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.5 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.8 sec].
Initializing JollyDayHoliday for sutime with classpath:edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt

Ignoring inactive rule: null
Ignoring inactive rule: temporal-composite-8:ranges
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/srparser/englishSR.ser.gz ...done [9.8 sec].
Adding annotator dcoref

Ready to process: 1 files, skipped 0, total 1
Processing file C:\***\stanford-corenlp-full-2014-06-16\input.txt ... 
writing to C:\***\stanford-corenlp-full-2014-06-16\input.txt.xml {
  Annotating file C:\***\stanford-corenlp-full-2014-06-16\input.txt [0.305 seconds]
Exception in thread "main" java.lang.RuntimeException: Error annotating C:\***\stanford-corenlp-full-2014-06-16\input.txt
        at edu.stanford.nlp.pipeline.StanfordCoreNLP$15.run(StanfordCoreNLP.java:1311)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1371)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1448)
Caused by: java.lang.UnsupportedOperationException: Unable to set constraints on the shift reduce parser (yet)
        at edu.stanford.nlp.parser.shiftreduce.ShiftReduceParserQuery.setConstraints(ShiftReduceParserQuery.java:218)
        at edu.stanford.nlp.pipeline.ParserAnnotator.doOneSentence(ParserAnnotator.java:282)
        at edu.stanford.nlp.pipeline.ParserAnnotator.doOneSentence(ParserAnnotator.java:250)
        at edu.stanford.nlp.pipeline.ParserAnnotator.annotate(ParserAnnotator.java:232)
        at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:67)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:871)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP$15.run(StanfordCoreNLP.java:1299)
        ... 2 more
AngledLuffa commented 10 years ago

Ah foo. I forgot that the coref sometimes reparses a piece of the sentence with constraints if it hasn't found a suitable head for a mention. This didn't show up in my testing... I'll try to have a fix by the end of next week, probably by implementing the constraints. On Jun 27, 2014 11:40 AM, "Necrolis" notifications@github.com wrote:

Yes, I have the SR models added to the classpath, and they are also loaded correctly (I've tried both the beam and the normal SR models).

This is the full output of a run:

C:***\stanford-corenlp-full-2014-06-16>corenlp.sh -file input.txt -props config.props Welcome to Git (version 1.8.3-preview20130601)

Run 'git help git' to display the help index. Run 'git help ' to display help for specific commands. C:**\stanford-corenlp-full-2014-06-16\corenlp.sh: line 13: readlink: command not found java -mx3g -cp "./" edu.stanford.nlp.pipeline.StanfordCoreNLP -file input.txt -props config.props Searching for resource: StanfordCoreNLP.properties Searching for resource: edu/stanford/nlp/pipeline/StanfordCoreNLP.properties Adding annotator tokenize Adding annotator ssplit

Adding annotator pos Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.8 sec]. Adding annotator lemma Adding annotator ner Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [4.3 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.5 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.8 sec]. Initializing JollyDayHoliday for sutime with classpath:edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt

Ignoring inactive rule: null Ignoring inactive rule: temporal-composite-8:ranges Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt Adding annotator parse Loading parser from serialized file edu/stanford/nlp/models/srparser/englishSR.ser.gz ...done [9.8 sec]. Adding annotator dcoref

Ready to process: 1 files, skipped 0, total 1 Processing file C:\stanford-corenlp-full-2014-06-16\input.txt ... writing to C:*\stanford-corenlp-full-2014-06-16\input.txt.xml { Annotating file C:\*\stanford-corenlp-full-2014-06-16\input.txt [0.305 seconds] Exception in thread "main" java.lang.RuntimeException: Error annotating C:\\stanford-corenlp-full-2014-06-16\input.txt at edu.stanford.nlp.pipeline.StanfordCoreNLP$15.run(StanfordCoreNLP.java:1311) at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1371) at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1448) Caused by: java.lang.UnsupportedOperationException: Unable to set constraints on the shift reduce parser (yet) at edu.stanford.nlp.parser.shiftreduce.ShiftReduceParserQuery.setConstraints(ShiftReduceParserQuery.java:218) at edu.stanford.nlp.pipeline.ParserAnnotator.doOneSentence(ParserAnnotator.java:282) at edu.stanford.nlp.pipeline.ParserAnnotator.doOneSentence(ParserAnnotator.java:250) at edu.stanford.nlp.pipeline.ParserAnnotator.annotate(ParserAnnotator.java:232) at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:67) at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:871) at edu.stanford.nlp.pipeline.StanfordCoreNLP$15.run(StanfordCoreNLP.java:1299) ... 2 more

— Reply to this email directly or view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/30#issuecomment-47362466.

Necrolis commented 10 years ago

I'll try to have a fix by the end of next week, probably by implementing the constraints.

Awesome, thanks very much for the effort :+1:

reckart commented 10 years ago

... or you could turn off reparsing in the coref module. A parameter for that was added in CoreNLP 3.3.1, although I do not know how to set that from a native CoreNLP pipeline - I set it directly via Java in 3.3.1. I hope the parameter is still present in the current version.

AngledLuffa commented 10 years ago

Thanks for reminding us of that. The option you're talking about is

dcoref.allowReparsing

and has not changed in the new version

On Sat, Jun 28, 2014 at 10:46 AM, Richard Eckart de Castilho < notifications@github.com> wrote:

... or you could turn off reparsing in the coref module. A parameter for that was added in CoreNLP 3.3.1, although I do not know how to set that from a native CoreNLP pipeline - I set it directly via Java in 3.3.1. I hope the parameter is still present in the current version.

— Reply to this email directly or view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/30#issuecomment-47433900.

Necrolis commented 10 years ago

I gave dcoref.allowReparsing a try but it doesn't seem to alter anything, still get the same error as before (maybe this is actually a problem from the initial parse and not a reparse?).

AngledLuffa commented 10 years ago

Massive fail on my part. I updated it so that "-dcoref.allowReparsing false" should now allow the srparser to work with corenlp. Making progress on adding the constraints to the srparser anyway.

Thanks for being our guinea pig!

On Mon, Jun 30, 2014 at 5:06 AM, Necrolis notifications@github.com wrote:

I gave dcoref.allowReparsing a try but it doesn't seem to alter anything, still get the same error as before (maybe this is actually a problem from the initial parse and not a reparse?).

— Reply to this email directly or view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/30#issuecomment-47523743.

Necrolis commented 10 years ago

Making progress on adding the constraints to the srparser anyway.

Excellent :)

Thanks for being our guinea pig!

I was very glad to see this addition, and I must say I'm, really impressed at the response times, plus this is the only way I can contribute (seeing as I hate Java :P), of course when this finally does work I'm going to be chucking at boat load of text at it, so expect more complaints :) (we've actually already found a way to DoS any servers that run CoreNLP, but we'd like to fix this via an extension to the tokenizer).

manning commented 10 years ago

If that means you've found a tokenizer bug, you can tell us about that too....

hamedkhanpour commented 9 years ago

I have downloaded recent version of the Stanford_corenlp to run Constituacy parser. I found that "srparser/englishSR.ser.gz" is not included in the models. would you please tell me what would be the solution?

hans commented 9 years ago

You need to have the shift-reduce parser models JAR on your classpath. Direct link to the 3.5.1-compatible version is here: http://nlp.stanford.edu/software/stanford-srparser-2014-10-23-models.jar

You can find the latest model downloads on the CoreNLP homepage.