bug french parser stanford #10

Closed romeo4934 closed 12 years ago

romeo4934 commented 12 years ago

I did this simple script :

require 'treat'

Treat.default_language = :french Treat.silence = false

s = 'Bonjour je suis bien au chateau' s.parse

I have an error message :

Any help ?

Thanx !

Loading parser from serialized file /Users/ztera/.rvm/gems/ruby-1.9.2-p290/gems/treat-1.0.4/models/stanford/grammar/frenchFactored.ser.gz ... java.lang.ClassNotFoundException: edu.stanford.nlp.parser.lexparser.FrenchUnknownWordModel

Loading parser from text file /Users/ztera/.rvm/gems/ruby-1.9.2-p290/gems/treat-1.0.4/models/stanford/grammar/frenchFactored.ser.gz /Users/ztera/.rvm/gems/ruby-1.9.2-p290/gems/stanford-core-nlp-0.3.0/lib/stanford-core-nlp.rb:124:in `new': /Users/ztera/.rvm/gems/ruby-1.9.2-p290/gems/treat-1.0.4/models/stanford/grammar/frenchFactored.ser.gz: expecting BEGIN block; got ¨Ì??sr??,edu.stanford.nlp.parser.lexparser.ParserData???????????????L??bgt??1Ledu/stanford/nlp/parser/lexparser/BinaryGrammar;L??dgt??5Ledu/stanford/nlp/parser/lexparser/DependencyGrammar;L??lext??+Ledu/stanford/nlp/parser/lexparser/Lexicon;L??ptt??+Ledu/stanford/nlp/parser/lexparser/Options;L?? (RuntimeException)

louismullie commented 12 years ago

Hey Antoine,

I haven't gotten to test non-English resources yet, so thanks for reporting this bug. My suspicion is that this is a problem with the frenchFactored.ser.gz models, possibly due to a conflict between the model version and the StanfordCoreNLP version. Are you comfortable downloading the latest Stanford Core NLP package at http://nlp.stanford.edu/software/corenlp.shtml and extracting the models from the JAR archive ("models.jar")? You could then copy the newer model (frenchFactored.ser.gz) into the gem folder, and I'm pretty sure this would solve the problem. If you can try that out, then please report on your success/failure. If you need more information on how to test the new models, don't hesitate.

If it doesn't work, I'll send a message to the stanford-nlp mailing list to see if they are able to resolve the conflict (which, as I said, seems to be due to improper format/version of the French model files).

Thanks again for reporting this!


romeo4934 commented 12 years ago

Thank you for your answer.

I tried to replace the file.

First I tried into corenlp but I found not french, only english version.

Then I downloaded stanford parser version 2012 03 09 . I found a file with the same name.

I replaced it but I have a new error message :

Loading parser from serialized file /Users/ztera/.rvm/gems/ruby-1.9.2-p290/gems/treat-1.0.4/models/stanford/grammar/frenchFactored.ser.gz ... /Users/ztera/.rvm/gems/ruby-1.9.2-p290/gems/stanford-core-nlp-0.3.0/lib/stanford-core-nlp.rb:124:in `new': Invalid class in file: /Users/ztera/.rvm/gems/ruby-1.9.2-p290/gems/treat-1.0.4/models/stanford/grammar/frenchFactored.ser.gz (RuntimeException)

Any help !

Thank you for your time,

You gem seems to be amazing :-)

romeo4934 commented 12 years ago

did you have any news from nlp stanford mailing list ?


louismullie commented 12 years ago

Yes, I am exchanging e-mails with them trying to fix the problem. For reference, the full stack of the error is:

romeo4934 commented 12 years ago

Great !!


romeo4934 commented 12 years ago

any news about the commit ? ;-)

louismullie commented 12 years ago

Hey Antoine,

Sorry, I've been really busy and haven't gotten to finalizing it. The bug is fixed, but I need to add the French tag set before pushing it. It'll be done most probably by tonight, but I can't guarantee :)

louismullie commented 12 years ago

By the way, this website may be useful to you: http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-fr.php

louismullie commented 12 years ago

Here it is :)

You'll have to update to the latest stanford-core-nlp and treat gem versions (should be updated in about an hour) and download the latest models:

gem install stanford-core-nlp
gem install treat
> require 'treat'
> Treat.install :french
Treat.default_language = :fre
text = Paragraph "Bonjour, je suis bel et bien arrivé au château. Je suis bien content de vous voir!"
text.do :segment, :parse, :category
+ Paragraph (70310192503320)  --- "Bonjour , je [...] vous voir!"  ---  {:language=>:fre}   --- [] 
+--+ Sentence (70310208332800)  --- "Bonjour , je [...] au château."  ---  {:language=>:fre, :tag=>"S", :tag_set=>:paris7, :category=>:sentence}   --- [] 
   +--+ Phrase (70310206823500)  --- "Bonjour"  ---  {:tag=>"NP", :category=>:noun_phrase}   --- [] 
      +--> Word (70310206759700)  --- "Bonjour"  ---  {:tag=>"I", :lemma=>"bonjour", :category=>:interjection}   --- [] 
   +--+ Phrase (70310206602300)  --- ", je suis bel arrivé"  ---  {:tag=>"VN", :category=>:verbal_nucleus}   --- [] 
      +--> Punctuation (70310206539260)  --- ","  ---  {:tag=>",", :lemma=>",", :category=>:comma}   --- [] 
      +--> Word (70310206373020)  --- "je"  ---  {:tag=>"CL", :lemma=>"je", :category=>:pronoun}   --- [] 
      +--> Word (70310206208400)  --- "suis"  ---  {:tag=>"V", :lemma=>"sui", :category=>:verb}   --- [] 
      +--+ Word (70310189436880)  --- "bel"  ---  {:tag=>"MWADV", :category=>:unknown}   --- [] 
         +--> Word (70310189390880)  --- "bel"  ---  {:tag=>"ADV", :lemma=>"bel", :category=>:adverb}   --- [] 
         +--> Word (70310189270000)  --- "et"  ---  {:tag=>"C", :lemma=>"et", :category=>:conjunction}   --- [] 
         +--> Word (70310189163860)  --- "bien"  ---  {:tag=>"ADV", :lemma=>"bien", :category=>:adverb}   --- [] 
      +--> Word (70310189043540)  --- "arrivé"  ---  {:tag=>"V", :lemma=>"arrivé", :category=>:verb}   --- [] 
   +--+ Phrase (70310188930300)  --- "au château"  ---  {:tag=>"PP", :category=>:prepositional_phrase}   --- [] 
      +--> Word (70310188885200)  --- "au"  ---  {:tag=>"P", :lemma=>"au", :category=>:preposition}   --- [] 
      +--+ Phrase (70310188771400)  --- "château"  ---  {:tag=>"NP", :category=>:noun_phrase}   --- [] 
         +--> Word (70310188725440)  --- "château"  ---  {:tag=>"N", :lemma=>"château", :category=>:noun}   --- [] 
   +--> Punctuation (70310188573120)  --- "."  ---  {:tag=>".", :lemma=>".", :category=>:period}   --- [] 
+--+ Sentence (70310208323580)  --- "Je suis bien [...] vous voir!"  ---  {:language=>:fre, :tag=>"S", :tag_set=>:paris7, :category=>:sentence}   --- [] 
   +--+ Phrase (70310195529340)  --- "Je suis"  ---  {:tag=>"VN", :category=>:verbal_nucleus}   --- [] 
      +--> Word (70310217752680)  --- "Je"  ---  {:tag=>"CL", :lemma=>"je", :category=>:pronoun}   --- [] 
      +--> Word (70310217964880)  --- "suis"  ---  {:tag=>"V", :lemma=>"sui", :category=>:verb}   --- [] 
   +--+ Phrase (70310217991520)  --- "bien content"  ---  {:tag=>"AP", :category=>:adjectival_phrase}   --- [] 
      +--> Word (70310195658160)  --- "bien"  ---  {:tag=>"ADV", :lemma=>"bien", :category=>:adverb}   --- [] 
      +--> Word (70310217789620)  --- "content"  ---  {:tag=>"A", :lemma=>"content", :category=>:adjective}   --- [] 
   +--+ Phrase (70310217824380)  --- "de vous voir"  ---  {:tag=>"VPinf", :category=>:infinitival_phrase}   --- [] 
      +--> Word (70310217841640)  --- "de"  ---  {:tag=>"P", :lemma=>"de", :category=>:preposition}   --- [] 
      +--+ Phrase (70310218230040)  --- "vous voir"  ---  {:tag=>"VN", :category=>:verbal_nucleus}   --- [] 
         +--> Word (70310218247020)  --- "vous"  ---  {:tag=>"CL", :lemma=>"vous", :category=>:pronoun}   --- [] 
         +--> Word (70310218287320)  --- "voir"  ---  {:tag=>"V", :lemma=>"voir", :category=>:verb}   --- [] 
   +--> Punctuation (70310218328120)  --- "!"  ---  {:tag=>"!", :lemma=>"!", :category=>:exclamation}   --- []
louismullie commented 12 years ago

Gem is now updated.

romeo4934 commented 12 years ago

Thank you very much !!! :-)

louismullie commented 12 years ago

Do let me know if you encounter any other bugs with the French stuff! I'll try to fix it as fast as posible.