percyliang / sempre

Semantic Parser with Execution
Other
829 stars 299 forks source link

Issue with training Sempre 2.0 Webquestions #46

Open shraddhavish opened 9 years ago

shraddhavish commented 9 years ago

I'm using Sempre 2.0 to train the webquestions dataset and running the below command:

./run @mode=freebase @domain=webquestions @train=1 @sparqlserver=localhost:3001 @cacheserver=local

But I'm getting the below error: ./run:268:in block in <main>': undefined local variable or methodagendaExperiments' for main:Object (NameError) from /home/sempre/fig/lib/execrunner.rb:130:in call' from /home/sempre/fig/lib/execrunner.rb:130:ingetRuns' from /home/sempre/fig/lib/execrunner.rb:96:in getRuns' from /home/sempre/fig/lib/execrunner.rb:96:ingetRuns' from /home/sempre/fig/lib/execrunner.rb:130:in getRuns' from /home/sempre/fig/lib/execrunner.rb:205:inexecute' from /home/sempre/fig/lib/execrunner.rb:215:in run!' from ./run:426:in

'

I cannot find a definition for agendaExperiments in run. Can someone please help with this!

yonatansito commented 9 years ago

You can just delete for now lines 265-269, and let us know if there are more problems.

On Fri, May 1, 2015 at 1:34 PM, shraddhavish notifications@github.com wrote:

I'm using Sempre 2.0 to train the webquestions dataset and running the below command:

./run @mode https://github.com/mode=freebase @domain https://github.com/domain=webquestions @train=1 @sparqlserver=localhost:3001 @cacheserver=local

But I'm getting the below error: ./run:268:in block in

': undefined local variable or methodagendaExperiments' for main:Object (NameError) from /home/sempre/fig/lib/execrunner.rb:130:in call' from /home/sempre/fig/lib/execrunner.rb:130:ingetRuns' from /home/sempre/fig/lib/execrunner.rb:96:in getRuns' from /home/sempre/fig/lib/execrunner.rb:96:ingetRuns' from /home/sempre/fig/lib/execrunner.rb:130:in getRuns' from /home/sempre/fig/lib/execrunner.rb:205:inexecute' from /home/sempre/fig/lib/execrunner.rb:215:in run!' from ./run:426:in '

I cannot find a definition for agendaExperiments in run. Can someone please help with this!

— Reply to this email directly or view it on GitHub https://github.com/percyliang/sempre/issues/46.

shraddhavish commented 9 years ago

Thank you for that. But now I'm getting some other error:

Value 1 (for key :entitysearch) is invalid; possible values are [0]

I even tried the below but get the same error: ./run @mode=freebase @domain=webquestions @train=0 @sparqlserver=localhost:3001 @cacheserver=local

What can be the cause for this?

shraddhavish commented 9 years ago

Ok I just changed the default value for entitysearch to 0 in the run file. It starts training based on the Webquestions json file correctly, but in the end, I get the below error:

I cannot find this dir lib/lucene, Can you please tell me what dependencies am I missing here? I believe this is for the lexicons.

Opening index dir: lib/lucene/4.4/inexact/ ERROR: Composition failed: rule = $Entity -> $NamedEntity (LexiconFn entity inexact), children = [(derivation (formula (string hadrian)) (type fb:type.text))] java.lang.RuntimeException: org.apache.lucene.store.NoSuchDirectoryException: directory '/home/sempre/lib/lucene/4.4/inexact' does not exist at edu.stanford.nlp.sempre.freebase.LexiconFn.call(LexiconFn.java:237)

yonatansito commented 9 years ago

You are right, our version 2.0 is working if you want to develop a new semantic parser, but if you want to train on webquestions there are some bugs right now.

If you want to train on webquestions I think what you can do is use version 1.0 (corresponding to papers that have been published) and we can update you when we fix things (it's pretty minor fixes, but still need to be done). Indeed, our current version is a lot better than the previous one but the run script in version 2.0 is not totally up to date and results are not yet published.

On Fri, May 1, 2015 at 7:27 PM, shraddhavish notifications@github.com wrote:

Ok I just changed the default value for entitysearch to 0 in the run file. It starts training based on the Webquestions json file correctly, but in the end, I get the below error:

I cannot find this dir lib/lucene, Can you please tell me what dependencies am I missing here? I believe this is for the lexicons.

Opening index dir: lib/lucene/4.4/inexact/ ERROR: Composition failed: rule = $Entity -> $NamedEntity (LexiconFn entity inexact), children = [(derivation (formula (string hadrian)) (type fb:type.text))] java.lang.RuntimeException: org.apache.lucene.store.NoSuchDirectoryException: directory '/home/sempre/lib/lucene/4.4/inexact' does not exist at edu.stanford.nlp.sempre.freebase.LexiconFn.call(LexiconFn.java:237)

— Reply to this email directly or view it on GitHub https://github.com/percyliang/sempre/issues/46#issuecomment-98293085.

uwittygit commented 9 years ago

So, will Sempre2.0 have any upgrade on algorithms, besides bridging or paraphrase ?

yonatansito commented 9 years ago

Yes, we plan to release a faster and more accurate parser in the next two months, stay tuned!

On Mon, May 4, 2015 at 7:55 PM, uwittygit notifications@github.com wrote:

So, will there be any upgrade on algorithms, besides bridging or paraphrase ?

— Reply to this email directly or view it on GitHub https://github.com/percyliang/sempre/issues/46#issuecomment-98917241.

shraddhavish commented 9 years ago

Hi for training and testing using Webquestions, I ran the below command: ./run @mode=freebase @domain=webquestions @train=1 @data=1 @sparqlserver=localhost:3093 @cacheserver=local

How can I print the derivation grammar rules that are used to predict answer for each test sentence, is there some specific parameter for that, or will the above command give me all the details along with the answer.

It's taking a long time to train, so I wanted to make sure the command is correct before testing.

yonatansito commented 9 years ago

First, to have a quick run to test things use arguments -Dataset.maxExampels train:5 test:1 -Learner.maxTrainIter 1 to train on 5 examples and test on one example for one iteration

Do you mean you want to print the grammar rules for the top predicted derivation? There is no option that does that currently, but you can write it.

You can activate the option. -SemanticFn.trackLocalChoices, that will do something similar to what you are talking about it (again, try it on a small number of training examples and you will see the effect in the log file). There is also the option: -Learner.outputPredDerivations that creates a huge output file in the execution directory with information on all predicted derivations. But this file is really huge so again test it to see if that's what you want.

On Wed, May 6, 2015 at 8:44 AM, Shraddha Vishwanathan < notifications@github.com> wrote:

Hi for training and testing using Webquestions, I ran the below command: ./run @mode https://github.com/mode=freebase @domain https://github.com/domain=webquestions @train=1 @data https://github.com/data=1 @sparqlserver=localhost:3093 @cacheserver=local

How can I print the derivation grammar rules that are used to predict answer for each test sentence, is there some specific parameter for that, or will the above command give me all the details along with the answer.

It's taking a long time to train, so I wanted to make sure the command is correct before testing.

— Reply to this email directly or view it on GitHub https://github.com/percyliang/sempre/issues/46#issuecomment-99518488.

shraddhavish commented 9 years ago

@yonatansito Thanks a lot for your inputs.

  1. Like you said, I trained and tested for a single sentence (what does jamaican people speak?), and it failed. I saw the learner.events file to check the top predicted derivation and it seems completely wrong (predFormula=(!fb:music.recording.producer fb:m.0fw7l_x)) I'm not sure if this is the correct file. Please let me know.

Also, I tried running the below logical query from one of the many predictions in log file, the answer for everything is either (list) or TIMEOUT. Please let me know if I'm doing something wrong. Is there some issue with the Freebase installation?

A) ./run @mode=query @sparqlserver=localhost:3093 -formula '(fb:people.ethnicity.languages_spoken fb:en.jamaican_creole)' Output: SparqlExecutor.execute: (fb:people.ethnicity.languages_spoken fb:en.jamaican_creole) (list)

B) ./run @mode=query @sparqlserver=localhost:3093 -formula '(fb:location.location.containedby fb:en.california)' Output: SparqlExecutor.execute: (fb:location.location.containedby fb:en.california) (list)

  1. Also, if the training has been performed, and I want to test few sentences in the interact prompt mode, can you please tell me the command if I use @interact=0 it starts training again and doesn't end in a prompt mode.

Sorry for asking so many questions @yonatansito and really appreciate your help.

yonatansito commented 9 years ago

Hi,

  1. I wouldn't expect to get the correct answer when training from a single sentence (too little data), running on a single example is just for debugging the system will not learn much.
  2. The logical forms that you run should not return an empty list or time out - this might be an issue with the KB installation, as in your other message.

Here's what I get:

./run @mode=query @sparqlserver=localhost:3093 -formula '(fb:people.ethnicity.languages_spoken fb:en.jamaican_creole)' main() { Loading Freebase schema: lib/fb_data/93.exec/schema2.ttl { 1163 CVTs, (19337,19282) property types, 858 property units } SparqlExecutor.execute: (fb:people.ethnicity.languages_spoken fb:en.jamaican_creole) (list (name fb:en.chinese_jamaican "Chinese Jamaicans") (name fb:en.jamaicans_of_african_ancestry "Jamaicans of African ancestry") (name fb:en.jamaican_american "Jamaican American") (name fb:en.indo-caribbean Indo-Caribbean) (name fb:en.jamaican_british "British Jamaican") (name fb:en.jamaican_australian "Jamaican Australian") (name fb:en.jamaicancanadian "Jamaican Canadian") (name fb:m.0hnb50 "Lebanese immigration to Jamaica") (name fb:m.0dgnbjt "Igbo people in Jamaica") (name fb:en.chinese_caribbean "Chinese Caribbean")) } [1.1s] ./run @mode=query @sparqlserver=localhost:3093 -formula '(fb:location.location.containedby fb:en.california)' main() { Loading Freebase schema: lib/fb_data/93.exec/schema2.ttl { 1163 CVTs, (19337,19282) property types, 858 property units } SparqlExecutor.execute: (fb:location.location.containedby fb:en.california) (list (name fb:en.charles_w_eliot_middle_school "Charles W. Eliot Middle School") (name fb:en.century_city "Century City") (name fb:en.sacramento Sacramento) (name fb:en.tuolumne_meadows "Tuolumne Meadows") (name fb:en.santa_monica_california "Santa Monica") (name fb:en.tuolumne_county "Tuolumne County") (name fb:en.san_francisco "San Francisco") (name fb:en.agoura Agoura) (name fb:en.gilroy Gilroy) (name fb:en.los_angeles_county "Los Angeles County")) }

On Wed, May 6, 2015 at 6:30 PM, Shraddha Vishwanathan < notifications@github.com> wrote:

@yonatansito https://github.com/yonatansito Thanks a lot for your inputs.

  1. Like you said, I trained and tested for a single sentence (what does jamaican people speak?), and it failed. I saw the learner.events file to check the top predicted derivation and it seems completely wrong (predFormula=(!fb:music.recording.producer fb:m.0fw7l_x)) I'm not sure if this is the correct file. Please let me know.

Also, I tried running the below logical query from one of the many predictions in log file, the answer for everything is either (list) or TIMEOUT. Please let me know if I'm doing something wrong. Is there some issue with the Freebase installation?

A) ./run @mode https://github.com/mode=query @sparqlserver=localhost:3093 -formula '(fb:people.ethnicity.languages_spoken fb:en.jamaican_creole)' Output: SparqlExecutor.execute: (fb:people.ethnicity.languages_spoken fb:en.jamaican_creole) (list)

B) ./run @mode https://github.com/mode=query @sparqlserver=localhost:3093 -formula '(fb:location.location.containedby fb:en.california)' Output: SparqlExecutor.execute: (fb:location.location.containedby fb:en.california) (list)

  1. Also, if the training has been performed, and I want to test few sentences in the interact prompt mode, can you please tell me the command if I use @interact https://github.com/interact=0 it starts training again and doesn't end in a prompt mode.

Sorry for asking so many questions @yonatansito https://github.com/yonatansito and really appreciate your help.

— Reply to this email directly or view it on GitHub https://github.com/percyliang/sempre/issues/46#issuecomment-99670561.

lindabai commented 8 years ago

Hi @yonatansito Currently I am trying to train on webquestions but the problem issue still occurs. May I consider it as the bug in version 2 is still not fixed? Is there any way to solve the problem? Thank you very much