percyliang / sempre

Semantic Parser with Execution
Other
828 stars 301 forks source link

Lucene not pulled by the dependency puller #219

Closed GindaChen closed 3 years ago

GindaChen commented 3 years ago

When execute

$ ./pull-dependencies freebase

and then

./run @mode=freebase @domain=webquestions @train=1 @sparqlserver=localhost:3001 @cacheserver=local

I encountered an parser error, followed by an message saying one of the folder lucene/ does not exist, which should have been downloaded in the dependency:

java.lang.RuntimeException: org.apache.lucene.store.NoSuchDirectoryException: 
directory '/mnt/data/sempre/lib/lucene/4.4/inexact' does not exist
``` Example lib/data/webquestions/dataset_11/webquestions.examples.train.json:3777 (3777): [what, kind, government, does, the, us, have, ?] => (list (description "Presidential system") (description "Federal republic") (description "Representative democracy") (description "Two-party system") (description "Constitutional republic") (description Republic)) Dataset stats { numTokenTypes = 3604 numTokensPerExample = 4/ << 7.715 ~ 1.592 >> /15 (3778) numExamples.train = 3022 numExamples.dev = 756 } } [50s, cum. 58s] Learner.learn() { Iteration 0/3 { Processing iter=0.train: 3022 examples { Examples { iter=0.train: example 0/3022: lib/data/webquestions/dataset_11/webquestions.examples.train.json:1553 { Example: where was emperor hadrian born? { Tokens: [where, was, emperor, hadrian, born, ?] Lemmatized tokens: [where, be, emperor, hadrian, bear, ?] POS tags: [WRB, VBD-AUX, NNP, NNP, VBN, .] NER tags: [O, O, O, PERSON, O, O] NER values: [null, null, null, null, null, null] targetValue: (list (description Rome)) Dependency children: [[], [], [], [compound->2], [advmod->0, auxpass->1, nsubjpass->3, punct->5], []] } Parser.parse: parse { Constructing Searcher { Opening index dir: lib/lucene/4.4/inexact/ ERROR: Composition failed: rule = $Entity -> $NamedEntity (LexiconFn entity inexact), children = [(derivation (formula (string hadrian)) (type fb:type.text))] java.lang.RuntimeException: org.apache.lucene.store.NoSuchDirectoryException: directory '/mnt/data/sempre/lib/lucene/4.4/inexact' does not exist at edu.stanford.nlp.sempre.freebase.LexiconFn.call(LexiconFn.java:237) at edu.stanford.nlp.sempre.BeamParserState.applyRule(BeamParser.java:142) at edu.stanford.nlp.sempre.BeamParserState.applyCatUnaryRules(BeamParser.java:193) at edu.stanford.nlp.sempre.BeamParserState.build(BeamParser.java:126) at edu.stanford.nlp.sempre.BeamParserState.infer(BeamParser.java:98) at edu.stanford.nlp.sempre.Parser.parse(Parser.java:170) at edu.stanford.nlp.sempre.Learner.parseExample(Learner.java:288) at edu.stanford.nlp.sempre.Learner.processExamples(Learner.java:199) at edu.stanford.nlp.sempre.Learner.learn(Learner.java:125) at edu.stanford.nlp.sempre.Learner.learn(Learner.java:90) at edu.stanford.nlp.sempre.Main.run(Main.java:27) at fig.exec.Execution.runWithObjArray(Execution.java:337) at fig.exec.Execution.run(Execution.java:325) at edu.stanford.nlp.sempre.Main.main(Main.java:50) Caused by: org.apache.lucene.store.NoSuchDirectoryException: directory '/mnt/data/sempre/lib/lucene/4.4/inexact' does not exist at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:218) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:242) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:712) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66) at edu.stanford.nlp.sempre.freebase.index.FbEntitySearcher.(FbEntitySearcher.java:45) at edu.stanford.nlp.sempre.freebase.EntityLexicon.lookupEntries(EntityLexicon.java:72) at edu.stanford.nlp.sempre.freebase.Lexicon.lookupEntities(Lexicon.java:65) at edu.stanford.nlp.sempre.freebase.LexiconFn.call(LexiconFn.java:204) ... 13 more ERROR: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.lucene.store.NoSuchDirectoryException: directory '/mnt/data/sempre/lib/lucene/4.4/inexact' does not exist: edu.stanford.nlp.sempre.BeamParserState.applyRule(BeamParser.java:165) edu.stanford.nlp.sempre.BeamParserState.applyCatUnaryRules(BeamParser.java:193) edu.stanford.nlp.sempre.BeamParserState.build(BeamParser.java:126) edu.stanford.nlp.sempre.BeamParserState.infer(BeamParser.java:98) edu.stanford.nlp.sempre.Parser.parse(Parser.java:170) edu.stanford.nlp.sempre.Learner.parseExample(Learner.java:288) edu.stanford.nlp.sempre.Learner.processExamples(Learner.java:199) edu.stanford.nlp.sempre.Learner.learn(Learner.java:125) edu.stanford.nlp.sempre.Learner.learn(Learner.java:90) edu.stanford.nlp.sempre.Main.run(Main.java:27) fig.exec.Execution.runWithObjArray(Execution.java:337) fig.exec.Execution.run(Execution.java:325) edu.stanford.nlp.sempre.Main.main(Main.java:50) ERROR: Caused by java.lang.RuntimeException: org.apache.lucene.store.NoSuchDirectoryException: directory '/mnt/data/sempre/lib/lucene/4.4/inexact' does not exist: edu.stanford.nlp.sempre.freebase.LexiconFn.call(LexiconFn.java:237) edu.stanford.nlp.sempre.BeamParserState.applyRule(BeamParser.java:142) edu.stanford.nlp.sempre.BeamParserState.applyCatUnaryRules(BeamParser.java:193) edu.stanford.nlp.sempre.BeamParserState.build(BeamParser.java:126) edu.stanford.nlp.sempre.BeamParserState.infer(BeamParser.java:98) edu.stanford.nlp.sempre.Parser.parse(Parser.java:170) edu.stanford.nlp.sempre.Learner.parseExample(Learner.java:288) edu.stanford.nlp.sempre.Learner.processExamples(Learner.java:199) edu.stanford.nlp.sempre.Learner.learn(Learner.java:125) edu.stanford.nlp.sempre.Learner.learn(Learner.java:90) edu.stanford.nlp.sempre.Main.run(Main.java:27) fig.exec.Execution.runWithObjArray(Execution.java:337) fig.exec.Execution.run(Execution.java:325) edu.stanford.nlp.sempre.Main.main(Main.java:50) Execution directory: state/execs/0.exec 3 errors, 0 warnings } Command failed: fig/bin/qcreate java -ea -Dmodules=core,freebase -Xms8G -Xmx10G -cp libsempre/*:lib/* edu.stanford.nlp.sempre.Main -execDir _OUTPATH_ -overwriteExecDir -addToView 0 -SparqlExecutor.endpointUrl http://localhost:3001/sparql -FeatureExtractor.featureDomains basicStats alignmentScores entityFeatures context skipPos joinPos wordSim lexAlign tokenMatch rule opCount constant denotation whType span derivRank lemmaAndBinaries -Builder.executor freebase.SparqlExecutor -Builder.valueEvaluator freebase.FreebaseValueEvaluator -LanguageAnalyzer.languageAnalyzer corenlp.CoreNLPAnalyzer -LexiconFn.lexiconClassName edu.stanford.nlp.sempre.fbalignment.lexicons.Lexicon -BinaryLexicon.binaryLexiconFilesPath lib/fb_data/7/binaryInfoStringAndAlignment.txt -BinaryLexicon.keyToSortBy Intersection_size_typed -UnaryLexicon.unaryLexiconFilePath lib/fb_data/7/unaryInfoStringAndAlignment.txt -EntityLexicon.entityPopularityPath lib/fb_data/7/entityPopularity.txt -TypeInference.typeLookup freebase.FreebaseTypeLookup -FreebaseSearch.cachePath /u/nlp/data/semparse/scr/cache/fbsearch/1.cache -Dataset.inPaths train,lib/data/webquestions/dataset_11/webquestions.examples.train.json -Dataset.trainFrac 0.8 -Dataset.devFrac 0.2 -Grammar.inPaths freebase/data/emnlp2013.grammar -Parser.beamSize 200 -Lexicon.cachePath LexiconFn.cache -SparqlExecutor.cachePath SparqlExecutor.cache -FreebaseSearch.cachePath FreebaseSearch.cache -EntityLexicon.inexactMatchIndex lib/lucene/4.4/inexact/ -LexiconFn.maxEntityEntries 10 -Grammar.tags webquestions bridge join inject inexact -Learner.maxTrainIters 3 -BridgeFn.useBinaryPredicateFeatures true -BridgeFn.filterBadDomain true -Dataset.splitRandom 1 ```

Indeed, I found the following error silently slipped away in the pull-dependency log:

``` /u/nlp/data/semparse/resources/lucene-core-4.4.0.jar --2020-11-26 00:50:47-- http://nlp.stanford.edu/software/sempre/dependencies-2.0/u/nlp/data/semparse/resources/lucene-core-4.4.0.jar Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140 Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected. HTTP request sent, awaiting response... 302 Found Location: https://nlp.stanford.edu/software/sempre/dependencies-2.0/u/nlp/data/semparse/resources/lucene-core-4.4.0.jar [following] --2020-11-26 00:50:47-- https://nlp.stanford.edu/software/sempre/dependencies-2.0/u/nlp/data/semparse/resources/lucene-core-4.4.0.jar Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected. HTTP request sent, awaiting response... 301 MOVED PERMANENTLY Location: https://nlp.stanford.edu/software/sempre/dependencies-2.0/u/nlp/data/semparse/resources/lucene-core-4.4.0.jar/ [following] --2020-11-26 00:50:48-- https://nlp.stanford.edu/software/sempre/dependencies-2.0/u/nlp/data/semparse/resources/lucene-core-4.4.0.jar/ Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected. HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable The file is already fully retrieved; nothing to do. /u/nlp/data/semparse/resources/lucene-analyzers-common-4.4.0.jar --2020-11-26 00:50:48-- http://nlp.stanford.edu/software/sempre/dependencies-2.0/u/nlp/data/semparse/resources/lucene-analyzers-common-4.4.0.jar Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140 Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected. HTTP request sent, awaiting response... 302 Found Location: https://nlp.stanford.edu/software/sempre/dependencies-2.0/u/nlp/data/semparse/resources/lucene-analyzers-common-4.4.0.jar [following] --2020-11-26 00:50:48-- https://nlp.stanford.edu/software/sempre/dependencies-2.0/u/nlp/data/semparse/resources/lucene-analyzers-common-4.4.0.jar Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected. HTTP request sent, awaiting response... 301 MOVED PERMANENTLY Location: https://nlp.stanford.edu/software/sempre/dependencies-2.0/u/nlp/data/semparse/resources/lucene-analyzers-common-4.4.0.jar/ [following] --2020-11-26 00:50:48-- https://nlp.stanford.edu/software/sempre/dependencies-2.0/u/nlp/data/semparse/resources/lucene-analyzers-common-4.4.0.jar/ Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected. HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable The file is already fully retrieved; nothing to do. /u/nlp/data/semparse/resources/lucene-queryparser-4.4.0.jar --2020-11-26 00:50:49-- http://nlp.stanford.edu/software/sempre/dependencies-2.0/u/nlp/data/semparse/resources/lucene-queryparser-4.4.0.jar Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140 Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected. HTTP request sent, awaiting response... 302 Found Location: https://nlp.stanford.edu/software/sempre/dependencies-2.0/u/nlp/data/semparse/resources/lucene-queryparser-4.4.0.jar [following] --2020-11-26 00:50:49-- https://nlp.stanford.edu/software/sempre/dependencies-2.0/u/nlp/data/semparse/resources/lucene-queryparser-4.4.0.jar Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected. HTTP request sent, awaiting response... 301 MOVED PERMANENTLY Location: https://nlp.stanford.edu/software/sempre/dependencies-2.0/u/nlp/data/semparse/resources/lucene-queryparser-4.4.0.jar/ [following] --2020-11-26 00:50:49-- https://nlp.stanford.edu/software/sempre/dependencies-2.0/u/nlp/data/semparse/resources/lucene-queryparser-4.4.0.jar/ Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected. HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable The file is already fully retrieved; nothing to do. ```
  1. Shall we fetch the lucene directly from the Apache archive?
  2. It seems there should be a folder lucene/inexact when we specify the inexact mode. Is it created at runtime (by sempre)?
GindaChen commented 3 years ago

Just saw this post, and it seems like the link to free917.tar.bz2 ~has expired~ is working (if manually download)