percyliang / sempre

Semantic Parser with Execution
Other
828 stars 301 forks source link

Not able to parse natural language #203

Open mr-asleep opened 4 years ago

mr-asleep commented 4 years ago

I have completely installed the setup and tried parsing utterances such as california, the golden state by following the tutorial. But now we tried using emnlp2013 grammar file by using the command: ./run @mode=simple-freebase-nocache @sparqlserver=localhost:3001 -Grammar.inPaths freebase/data/emnlp2013.grammar. But I am not able to get the logical forms. Do we need to include a lexicon file as well as they did in the tutorial. If yes, where do we find one? Else please suggest something to help me move forward.

Thanks in advance.

ppasupat commented 4 years ago

The command

./pull-dependencies freebase

should pull the lexicons to lib/fb_data/7/ (the unintuitive name is a bit unfortunate). Then use the following flags:

-UnaryLexicon.unaryLexiconFilePath lib/fb_data/7/unaryInfoStringAndAlignment.txt -BinaryLexicon.binaryLexiconFilesPath lib/fb_data/7/binaryInfoStringAndAlignment.txt

If this does not work, modify the 'simple-freebase-nocache' mode in the run script (around line 496). Change sparqlOpts to freebaseOpts (which will load sparqlOpts and set a few other opts), and then comment out the two lines that set the lexicons to /dev/null:

addMode('simple-freebase-nocache', ....
  ...
  freebaseOpts,      # instead of sparqlOpts
  ...
  # remove o('UnaryLexicon.unaryLexiconFilePath', '/dev/null') since freebaseOpts already sets the lexicon
  # remove o('BinaryLexicon.binaryLexiconFilesPath', '/dev/null') likewise
  ...
nil) })
mr-asleep commented 4 years ago

I tried the following command-

./run @mode=simple-freebase-nocache @sparqlserver=localhost:3001 -Grammar.inPaths freebase/data/emnlp2013.grammar -SimpleLexicon.inPaths freebase/data/tutorial-freebase.lexicon -UnaryLexicon.unaryLexiconFilePath lib/fb_data/7/unaryInfoStringAndAlignment.txt -BinaryLexicon.binaryLexiconFilesPath lib/fb_data/7/binaryInfoStringAndAlignment.txt

Is the grammar file that I added correct? Also still I am not able to get the logical forms for the lexemes that were given in the two files that you mentioned. Is there anything else that I need to add to the command or any lexical file and for which natural language questions would I get the correct logical forms.

Thanks

ppasupat commented 4 years ago

After some digging, I think I found the issue. The two lexicon files are for unaries (e.g., state / city) and binaries (e.g., locatedIn), but not entities (e.g., California / Sacramento). Since the set of entities in Freebase is huge, a cache server is required for looking up entities. This is not available in the "simple-freebase-nocache" mode.

There are also two more missing arguments to the command:

-LanguageAnalyzer.languageAnalyzer corenlp.CoreNLPAnalyzer \
-Grammar.tags webquestions exact bridge join inject

The first loads the CoreNLP parser which is required for some grammar rules with POS / NER tags. (It only works if you did ./pull-dependencies corenlp and ant corenlp first). The second turns on the relevant "when" statements in the grammar file. I got this list of grammar tags from the run script (Line 271).

But running with the two additional options above will still give you an error when the Lexicon class is trying to access the cached entities. This is a bit beyond my knowledge of the repo, but I can try to dig for the answer later.