percyliang / sempre

Semantic Parser with Execution
Other
829 stars 300 forks source link

Lucene (in)exact file not found #220

Closed GindaChen closed 3 years ago

GindaChen commented 3 years ago

To reproduce the problem, I was trying to run on a limited grammar rules on the corenlp mode

./run @mode=simple -languageAnalyzer corenlp.CoreNLPAnalyzer 

with the following simplified version of emnlp2013 grammar:

# Nouns: Match any unaries.
(rule $Noun ($LEMMA_TOKEN) (FilterPosTagFn token WRB WP NN NNS NNP NNPS))
(rule $SimpleNounPhrase ($Noun) (ConcatFn " "))
(rule $SimpleNounPhrase ($Noun $SimpleNounPhrase) (ConcatFn " "))

# Name Entities: Match an entity if it is a sequence of NE tags or NNP tags or of minimal length
(rule $NamedEntity ($PHRASE) (FilterNerSpanFn PERSON ORGANIZATION LOCATION MISC))
(rule $NamedEntity ($PHRASE) (FilterPosTagFn span NNP))

### Lexicon: call LexiconFn on the spans.
(rule $Entity ($NamedEntity) (LexiconFn entity exact)) # or: (rule $Entity ($NamedEntity) (LexiconFn entity inexact))
(rule $Entity ($TokenSpan) (LexiconFn entity exact)) # or: (rule $Entity ($TokenSpan) (LexiconFn entity inexact))

(rule $ROOT ($Entity) (IdentityFn))

The exact rules for $Entity will give me the followinig error:

``` Constructing Searcher { Opening index dir: null ERROR: Composition failed: rule = $Entity -> $NamedEntity (LexiconFn entity exact), children = [(derivation (formula (string mike)) (type fb:type.text))] java.lang.NullPointerException at java.io.File.(File.java:279) at edu.stanford.nlp.sempre.freebase.index.FbEntitySearcher.(FbEntitySearcher.java:45) at edu.stanford.nlp.sempre.freebase.EntityLexicon.lookupEntries(EntityLexicon.java:69) at edu.stanford.nlp.sempre.freebase.Lexicon.lookupEntities(Lexicon.java:65) at edu.stanford.nlp.sempre.freebase.LexiconFn.call(LexiconFn.java:204) at edu.stanford.nlp.sempre.BeamParserState.applyRule(BeamParser.java:142) at edu.stanford.nlp.sempre.BeamParserState.applyCatUnaryRules(BeamParser.java:193) at edu.stanford.nlp.sempre.BeamParserState.build(BeamParser.java:126) at edu.stanford.nlp.sempre.BeamParserState.infer(BeamParser.java:98) at edu.stanford.nlp.sempre.Parser.parse(Parser.java:170) at edu.stanford.nlp.sempre.Master.handleUtterance(Master.java:235) at edu.stanford.nlp.sempre.Master.processQuery(Master.java:189) at edu.stanford.nlp.sempre.Master.runInteractivePrompt(Master.java:151) at edu.stanford.nlp.sempre.Main.run(Main.java:34) at fig.exec.Execution.runWithObjArray(Execution.java:337) at fig.exec.Execution.run(Execution.java:325) at edu.stanford.nlp.sempre.Main.main(Main.java:50) } } } java.lang.RuntimeException: java.lang.NullPointerException at edu.stanford.nlp.sempre.BeamParserState.applyRule(BeamParser.java:165) at edu.stanford.nlp.sempre.BeamParserState.applyCatUnaryRules(BeamParser.java:193) at edu.stanford.nlp.sempre.BeamParserState.build(BeamParser.java:126) at edu.stanford.nlp.sempre.BeamParserState.infer(BeamParser.java:98) at edu.stanford.nlp.sempre.Parser.parse(Parser.java:170) at edu.stanford.nlp.sempre.Master.handleUtterance(Master.java:235) at edu.stanford.nlp.sempre.Master.processQuery(Master.java:189) at edu.stanford.nlp.sempre.Master.runInteractivePrompt(Master.java:151) at edu.stanford.nlp.sempre.Main.run(Main.java:34) at fig.exec.Execution.runWithObjArray(Execution.java:337) at fig.exec.Execution.run(Execution.java:325) at edu.stanford.nlp.sempre.Main.main(Main.java:50) Caused by: java.lang.NullPointerException at java.io.File.(File.java:279) at edu.stanford.nlp.sempre.freebase.index.FbEntitySearcher.(FbEntitySearcher.java:45) at edu.stanford.nlp.sempre.freebase.EntityLexicon.lookupEntries(EntityLexicon.java:69) at edu.stanford.nlp.sempre.freebase.Lexicon.lookupEntities(Lexicon.java:65) at edu.stanford.nlp.sempre.freebase.LexiconFn.call(LexiconFn.java:204) at edu.stanford.nlp.sempre.BeamParserState.applyRule(BeamParser.java:142) ... 11 more ```

while the inexact rules will give me the followinig error:

``` Constructing Searcher { Opening index dir: lib/lucene/4.4/inexact ERROR: Composition failed: rule = $Entity -> $NamedEntity (LexiconFn entity inexact), children = [(derivation (formula (string "Mike Chen")) (type fb:type.text))] java.lang.RuntimeException: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.MMapDirectory@/mnt/data/sempre/lib/lucene/4.4/inexact lockFactory=org.apache.lucene.store.NativeFSLockFactory@6a4238ff: files: [] at edu.stanford.nlp.sempre.freebase.LexiconFn.call(LexiconFn.java:237) at edu.stanford.nlp.sempre.BeamParserState.applyRule(BeamParser.java:142) at edu.stanford.nlp.sempre.BeamParserState.applyCatUnaryRules(BeamParser.java:193) at edu.stanford.nlp.sempre.BeamParserState.build(BeamParser.java:126) at edu.stanford.nlp.sempre.BeamParserState.infer(BeamParser.java:98) at edu.stanford.nlp.sempre.Parser.parse(Parser.java:170) at edu.stanford.nlp.sempre.Master.handleUtterance(Master.java:235) at edu.stanford.nlp.sempre.Master.processQuery(Master.java:189) at edu.stanford.nlp.sempre.Master.runInteractivePrompt(Master.java:151) at edu.stanford.nlp.sempre.Main.run(Main.java:34) at fig.exec.Execution.runWithObjArray(Execution.java:337) at fig.exec.Execution.run(Execution.java:325) at edu.stanford.nlp.sempre.Main.main(Main.java:50) Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.MMapDirectory@/mnt/data/sempre/lib/lucene/4.4/inexact lockFactory=org.apache.lucene.store.NativeFSLockFactory@6a4238ff: files: [] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:770) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66) at edu.stanford.nlp.sempre.freebase.index.FbEntitySearcher.(FbEntitySearcher.java:45) at edu.stanford.nlp.sempre.freebase.EntityLexicon.lookupEntries(EntityLexicon.java:72) at edu.stanford.nlp.sempre.freebase.Lexicon.lookupEntities(Lexicon.java:65) at edu.stanford.nlp.sempre.freebase.LexiconFn.call(LexiconFn.java:204) ... 12 more } } } java.lang.RuntimeException: java.lang.RuntimeException: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.MMapDirectory@/mnt/data/sempre/lib/lucene/4.4/inexact lockFactory=org.apache.lucene.store.NativeFSLockFactory@6a4238ff: files: [] at edu.stanford.nlp.sempre.BeamParserState.applyRule(BeamParser.java:165) at edu.stanford.nlp.sempre.BeamParserState.applyCatUnaryRules(BeamParser.java:193) at edu.stanford.nlp.sempre.BeamParserState.build(BeamParser.java:126) at edu.stanford.nlp.sempre.BeamParserState.infer(BeamParser.java:98) at edu.stanford.nlp.sempre.Parser.parse(Parser.java:170) at edu.stanford.nlp.sempre.Master.handleUtterance(Master.java:235) at edu.stanford.nlp.sempre.Master.processQuery(Master.java:189) at edu.stanford.nlp.sempre.Master.runInteractivePrompt(Master.java:151) at edu.stanford.nlp.sempre.Main.run(Main.java:34) at fig.exec.Execution.runWithObjArray(Execution.java:337) at fig.exec.Execution.run(Execution.java:325) at edu.stanford.nlp.sempre.Main.main(Main.java:50) Caused by: java.lang.RuntimeException: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.MMapDirectory@/mnt/data/sempre/lib/lucene/4.4/inexact lockFactory=org.apache.lucene.store.NativeFSLockFactory@6a4238ff: files: [] at edu.stanford.nlp.sempre.freebase.LexiconFn.call(LexiconFn.java:237) at edu.stanford.nlp.sempre.BeamParserState.applyRule(BeamParser.java:142) ... 11 more Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.MMapDirectory@/mnt/data/sempre/lib/lucene/4.4/inexact lockFactory=org.apache.lucene.store.NativeFSLockFactory@6a4238ff: files: [] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:770) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66) at edu.stanford.nlp.sempre.freebase.index.FbEntitySearcher.(FbEntitySearcher.java:45) at edu.stanford.nlp.sempre.freebase.EntityLexicon.lookupEntries(EntityLexicon.java:72) at edu.stanford.nlp.sempre.freebase.Lexicon.lookupEntities(Lexicon.java:65) at edu.stanford.nlp.sempre.freebase.LexiconFn.call(LexiconFn.java:204) ... 12 more ```

Questions

  1. What should be in lib/lucene/4.4/inexact? Can I generate a segment file using lucene?
  2. I see that there is a line in the class Options that does not initialize the path for "exact", and there is no other place this value has been set. Is that intentional? Or is that a bug needs to be fixed? https://github.com/percyliang/sempre/blob/b27c06906da33e345c645ff9470132bf6d1c26dc/src/edu/stanford/nlp/sempre/freebase/EntityLexicon.java#L35-L36
GindaChen commented 3 years ago

I think I have found the secrete place where inexact hides:

But I still cannot find where the exact files located... @ppasupat Would you like to have a look at the issue?

GindaChen commented 3 years ago

I saw the comments in emnlp2013 grammar and the original code, saying that exact has some bug... It would be great if we can just raise an exception when the exact mode is selected