OpenIE fails for some sentences

naoya-i commented 8 years ago

Hi,

I use Stanford OpenIE (http://stanfordnlp.github.io/CoreNLP/openie.html) to extract triples from Gigaword corpus. I call "edu.stanford.nlp.naturalli.OpenIE" module from Stanford CoreNLP jar files as follows:

$ echo "John was born in the US." | java -mx1g -cp stanford-corenlp-3.6.0.jar:stanford-corenlp-3.6.0-models.jar:CoreNLP-to-HTML.xsl:slf4j-api.jar:slf4j-simple.jar edu.stanford.nlp.naturalli.OpenIE

However, some sentences from Gigaword corpus crash Stanford OpenIE as follows:

$ echo "In the meantime the only road in and out of the city crosses a Bosnian Serb checkpoint." | java -mx1g -cp stanford-corenlp-3.6.0.jar:stanford-corenlp-3.6.0-models.jar:CoreNLP-to-HTML.xsl:slf4j-api.jar:slf4j-simple.jar edu.stanford.nlp.naturalli.OpenIE
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.8 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
PreComputed 100000, Elapsed Time: 1.606 (s)
Initializing dependency parser done [4.6 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator natlog
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator openie
Loading clause searcher from edu/stanford/nlp/models/naturalli/clauseSearcherModel.ser.gz...done [0.90 seconds]
Processing from stdin. Enter one sentence per line.
Exception in thread "main" java.util.NoSuchElementException: No value present
       at java.util.Optional.get(Optional.java:135)
       at edu.stanford.nlp.naturalli.RelationTripleSegmenter.extract(RelationTripleSegmenter.java:282)
       at edu.stanford.nlp.naturalli.OpenIE.annotateSentence(OpenIE.java:485)
       at edu.stanford.nlp.naturalli.OpenIE.lambda$annotate$3(OpenIE.java:554)
       at edu.stanford.nlp.naturalli.OpenIE$$Lambda$24/1197365356.accept(Unknown Source)
       at java.util.ArrayList.forEach(ArrayList.java:1249)
       at edu.stanford.nlp.naturalli.OpenIE.annotate(OpenIE.java:554)
       at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:71)
       at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:499)
       at edu.stanford.nlp.naturalli.OpenIE.processDocument(OpenIE.java:630)
       at edu.stanford.nlp.naturalli.OpenIE.main(OpenIE.java:736)

So far, I couldn't find any regularity of sentences that can cause this Java exception. For reference, I also pasted other 9 sentences that can cause the Java exception.

The official result differed slightly from figures given Sunday, which were 52.2 percent in favour, 46.9 percent against and 0.9 percent blank ballots.
In the meantime the only road in and out of the city crosses a Bosnian Serb checkpoint.
"It seems to be the kind of rehearsed introduction of a Government cave in and another tactical ceasefire by the IRA. Unionists will not be gulled a second time. Unless there is a complete, permanent and universal ceasefire, Unionists will not be taken in by it," he added.
US Ford Motor's three-litre Taurus topped the March sales list, followed by Chrysler Corp.'s Stratus of and Sweden's Volvo, he said.
A miskick by Rufus Brevett set Marcelo Marcelino clean through but the Brazilian's low strike was forced round the post by Maik Taylor at full stretch.
It has been described as a law firm with but a single client: the Bill of Rights, or first 10 amendments to the US Constitution guaranteeing individual rights and freedoms.
The second Audi driven by Japan's Seiji Ara, Denmark's Jan Magnussen and Germany's Marco Werner had been in contention early on but suffered mechanical problems overnight leaving them seven laps behind the leaders.
Her on-stage antics with religious symbols may have riled believers the world over but Madonna's ticket sales in Roman Catholic Italy appear not to have suffered.
Cox had undergone a heart lesion operation here three weeks ago but according to the team's website his condition deteriorated on Monday and he was rushed to hospital where he underwent an operation on a bleeding artery.
The North has proposed holding one-day military talks on easing restrictions on travel in and out of the Seoul-funded estate just north of the border and two- day economic talks on reviving tours by South Koreans to the Mount Kumgang resort in the North from Tuesday.

Of course, it would be happy if the error is fixed. However, the happier solution that I personally think is to let Stanford OpenIE have "-ignore-errors" option, which is implemented in Ollie, University of Washington's OpenIE system (https://knowitall.github.io/ollie/). The "-ignore-errors" option makes the software more error-tolerant, allowiing us to skip a sentence that causes an error, and just move on to the next sentence. This should be extremely useful for parsing a large file.

gangeli commented 8 years ago

Thanks for reporting this! Are you using the official release (3.6.0), or the GitHub HEAD version of the code? I remember I fixed an error similar to this a bit ago, and corenlp.run doesn't crash on the sentence, which means that hopefully it's the same bug. If you're not already on it, you can build the GitHub code with ant jar, and use the resulting jar file instead of the official release. I think the models should be the same, but if they're not, there's a link to download the most recent models from the project homepage.

The lack of an -ignore-errors flag is actually kind of deliberate. I'd like to hold OpenIE to a standard of never crashing (after all, the rest of CoreNLP doesn't crash either), and therefore any exception should be treated as a critical bug that should be fixed quickly.

naoya-i commented 8 years ago

Thanks for your reply! I have tried only the official release (3.6.0) at that time, so I tried the GitHub version at this time. Fortunately, the GitHub version did not crash on all the sentences that I mentioned. For the time being, I will work on this version.

The lack of an -ignore-errors flag is actually kind of deliberate. I'd like to hold OpenIE to a standard of never crashing (after all, the rest of CoreNLP doesn't crash either), and therefore any exception should be treated as a critical bug that should be fixed quickly.

Ok, I understand the philosophy behind CoreNLP ;-) If I encounter some other problems, I’ll come back again!

Thanks!

Naoya

On May 13, 2016, at 06:26, Gabor Angeli notifications@github.com wrote:

Thanks for reporting this! Are you using the official release (3.6.0), or the GitHub HEAD version of the code? I remember I fixed an error similar to this a bit ago, and corenlp.run doesn't crash on the sentence, which means that hopefully it's the same bug. If you're not already on it, you can build the GitHub code with ant jar, and use the resulting jar file instead of the official release. I think the models should be the same, but if they're not, there's a link to download the most recent models from the project homepage.

The lack of an -ignore-errors flag is actually kind of deliberate. I'd like to hold OpenIE to a standard of never crashing (after all, the rest of CoreNLP doesn't crash either), and therefore any exception should be treated as a critical bug that should be fixed quickly.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

stanfordnlp / CoreNLP

OpenIE fails for some sentences #187