louismullie / treat

Natural language processing framework for Ruby.
Other
1.37k stars 128 forks source link

bug french parser stanford #10

Closed romeo4934 closed 12 years ago

romeo4934 commented 12 years ago

I did this simple script :

require 'treat'

Treat.default_language = :french Treat.silence = false

s = 'Bonjour je suis bien au chateau' s.parse

I have an error message :

Any help ?

Thanx !

Loading parser from serialized file /Users/ztera/.rvm/gems/ruby-1.9.2-p290/gems/treat-1.0.4/models/stanford/grammar/frenchFactored.ser.gz ... java.lang.ClassNotFoundException: edu.stanford.nlp.parser.lexparser.FrenchUnknownWordModel

Loading parser from text file /Users/ztera/.rvm/gems/ruby-1.9.2-p290/gems/treat-1.0.4/models/stanford/grammar/frenchFactored.ser.gz /Users/ztera/.rvm/gems/ruby-1.9.2-p290/gems/stanford-core-nlp-0.3.0/lib/stanford-core-nlp.rb:124:in `new': /Users/ztera/.rvm/gems/ruby-1.9.2-p290/gems/treat-1.0.4/models/stanford/grammar/frenchFactored.ser.gz: expecting BEGIN block; got ¨Ì??sr??,edu.stanford.nlp.parser.lexparser.ParserData???????????????L??bgt??1Ledu/stanford/nlp/parser/lexparser/BinaryGrammar;L??dgt??5Ledu/stanford/nlp/parser/lexparser/DependencyGrammar;L??lext??+Ledu/stanford/nlp/parser/lexparser/Lexicon;L??ptt??+Ledu/stanford/nlp/parser/lexparser/Options;L?? (RuntimeException)

louismullie commented 12 years ago

Hey Antoine,

I haven't gotten to test non-English resources yet, so thanks for reporting this bug. My suspicion is that this is a problem with the frenchFactored.ser.gz models, possibly due to a conflict between the model version and the StanfordCoreNLP version. Are you comfortable downloading the latest Stanford Core NLP package at http://nlp.stanford.edu/software/corenlp.shtml and extracting the models from the JAR archive ("models.jar")? You could then copy the newer model (frenchFactored.ser.gz) into the gem folder, and I'm pretty sure this would solve the problem. If you can try that out, then please report on your success/failure. If you need more information on how to test the new models, don't hesitate.

If it doesn't work, I'll send a message to the stanford-nlp mailing list to see if they are able to resolve the conflict (which, as I said, seems to be due to improper format/version of the French model files).

Thanks again for reporting this!

Louis

romeo4934 commented 12 years ago

Thank you for your answer.

I tried to replace the file.

First I tried into corenlp but I found not french, only english version.

Then I downloaded stanford parser version 2012 03 09 . I found a file with the same name.

I replaced it but I have a new error message :

Loading parser from serialized file /Users/ztera/.rvm/gems/ruby-1.9.2-p290/gems/treat-1.0.4/models/stanford/grammar/frenchFactored.ser.gz ... /Users/ztera/.rvm/gems/ruby-1.9.2-p290/gems/stanford-core-nlp-0.3.0/lib/stanford-core-nlp.rb:124:in `new': Invalid class in file: /Users/ztera/.rvm/gems/ruby-1.9.2-p290/gems/treat-1.0.4/models/stanford/grammar/frenchFactored.ser.gz (RuntimeException)

Any help !

Thank you for your time,

You gem seems to be amazing :-)

romeo4934 commented 12 years ago

did you have any news from nlp stanford mailing list ?

Best

louismullie commented 12 years ago

Yes, I am exchanging e-mails with them trying to fix the problem. For reference, the full stack of the error is:

java.lang.ClassNotFoundException: edu.stanford.nlp.parser.lexparser.FrenchUnknownWordModel
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:603)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1574)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1495)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1731)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
    at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:479)
    at edu.stanford.nlp.parser.lexparser.BaseLexicon.readObject(BaseLexicon.java:684)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserDataFromSerializedFile(LexicalizedParser.java:540)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserDataFromFile(LexicalizedParser.java:326)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.<init>(LexicalizedParser.java:154)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.<init>(LexicalizedParser.java:139)
    at edu.stanford.nlp.pipeline.ParserAnnotator.loadModel(ParserAnnotator.java:81)
    at edu.stanford.nlp.pipeline.ParserAnnotator.<init>(ParserAnnotator.java:61)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP$12.create(StanfordCoreNLP.java:608)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP$12.create(StanfordCoreNLP.java:584)
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:62)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:328)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:195)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:185)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:177)
romeo4934 commented 12 years ago

Great !!

Thankx

2012/5/15 Louis Mullie < reply@reply.github.com

Got it! Patch coming later today :)


Reply to this email directly or view it on GitHub: https://github.com/louismullie/treat/issues/10#issuecomment-5717950

romeo4934 commented 12 years ago

any news about the commit ? ;-)

louismullie commented 12 years ago

Hey Antoine,

Sorry, I've been really busy and haven't gotten to finalizing it. The bug is fixed, but I need to add the French tag set before pushing it. It'll be done most probably by tonight, but I can't guarantee :)

louismullie commented 12 years ago

By the way, this website may be useful to you: http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-fr.php

louismullie commented 12 years ago

Here it is :)

8e71436069fe2f7f3d3bc37cd7e4e0b4631f70aa a9cbe2c3c26ade118e33bcd93b66d1007cccc7fe 364751c1b570fef568ac4c0be66701fced9c2663 364751c1b570fef568ac4c0be66701fced9c2663 6e4ff18ac227f23d8aa0048ab32270256ea46eb4 54a2f4b314e2f098107b2fbb4fde0fe83f048644 a3c9f02e2c21b148f3541d0760e9de6a857f8b08 36b4afe53bd3971b4e27151b2d41e157e95094ec 9a36f61f1053834ad2c8f8143034efaffd27ccbb 3a6d7b4cd13f52d67590da48525fe03851cf40a2

You'll have to update to the latest stanford-core-nlp and treat gem versions (should be updated in about an hour) and download the latest models:

gem install stanford-core-nlp
gem install treat
irb 
> require 'treat'
> Treat.install :french
Treat.default_language = :fre
text = Paragraph "Bonjour, je suis bel et bien arrivé au château. Je suis bien content de vous voir!"
text.do :segment, :parse, :category
text.print_tree
+ Paragraph (70310192503320)  --- "Bonjour , je [...] vous voir!"  ---  {:language=>:fre}   --- [] 
|
+--+ Sentence (70310208332800)  --- "Bonjour , je [...] au château."  ---  {:language=>:fre, :tag=>"S", :tag_set=>:paris7, :category=>:sentence}   --- [] 
   |
   +--+ Phrase (70310206823500)  --- "Bonjour"  ---  {:tag=>"NP", :category=>:noun_phrase}   --- [] 
      |
      +--> Word (70310206759700)  --- "Bonjour"  ---  {:tag=>"I", :lemma=>"bonjour", :category=>:interjection}   --- [] 
   +--+ Phrase (70310206602300)  --- ", je suis bel arrivé"  ---  {:tag=>"VN", :category=>:verbal_nucleus}   --- [] 
      |
      +--> Punctuation (70310206539260)  --- ","  ---  {:tag=>",", :lemma=>",", :category=>:comma}   --- [] 
      +--> Word (70310206373020)  --- "je"  ---  {:tag=>"CL", :lemma=>"je", :category=>:pronoun}   --- [] 
      +--> Word (70310206208400)  --- "suis"  ---  {:tag=>"V", :lemma=>"sui", :category=>:verb}   --- [] 
      +--+ Word (70310189436880)  --- "bel"  ---  {:tag=>"MWADV", :category=>:unknown}   --- [] 
         |
         +--> Word (70310189390880)  --- "bel"  ---  {:tag=>"ADV", :lemma=>"bel", :category=>:adverb}   --- [] 
         +--> Word (70310189270000)  --- "et"  ---  {:tag=>"C", :lemma=>"et", :category=>:conjunction}   --- [] 
         +--> Word (70310189163860)  --- "bien"  ---  {:tag=>"ADV", :lemma=>"bien", :category=>:adverb}   --- [] 
      +--> Word (70310189043540)  --- "arrivé"  ---  {:tag=>"V", :lemma=>"arrivé", :category=>:verb}   --- [] 
   +--+ Phrase (70310188930300)  --- "au château"  ---  {:tag=>"PP", :category=>:prepositional_phrase}   --- [] 
      |
      +--> Word (70310188885200)  --- "au"  ---  {:tag=>"P", :lemma=>"au", :category=>:preposition}   --- [] 
      +--+ Phrase (70310188771400)  --- "château"  ---  {:tag=>"NP", :category=>:noun_phrase}   --- [] 
         |
         +--> Word (70310188725440)  --- "château"  ---  {:tag=>"N", :lemma=>"château", :category=>:noun}   --- [] 
   +--> Punctuation (70310188573120)  --- "."  ---  {:tag=>".", :lemma=>".", :category=>:period}   --- [] 
+--+ Sentence (70310208323580)  --- "Je suis bien [...] vous voir!"  ---  {:language=>:fre, :tag=>"S", :tag_set=>:paris7, :category=>:sentence}   --- [] 
   |
   +--+ Phrase (70310195529340)  --- "Je suis"  ---  {:tag=>"VN", :category=>:verbal_nucleus}   --- [] 
      |
      +--> Word (70310217752680)  --- "Je"  ---  {:tag=>"CL", :lemma=>"je", :category=>:pronoun}   --- [] 
      +--> Word (70310217964880)  --- "suis"  ---  {:tag=>"V", :lemma=>"sui", :category=>:verb}   --- [] 
   +--+ Phrase (70310217991520)  --- "bien content"  ---  {:tag=>"AP", :category=>:adjectival_phrase}   --- [] 
      |
      +--> Word (70310195658160)  --- "bien"  ---  {:tag=>"ADV", :lemma=>"bien", :category=>:adverb}   --- [] 
      +--> Word (70310217789620)  --- "content"  ---  {:tag=>"A", :lemma=>"content", :category=>:adjective}   --- [] 
   +--+ Phrase (70310217824380)  --- "de vous voir"  ---  {:tag=>"VPinf", :category=>:infinitival_phrase}   --- [] 
      |
      +--> Word (70310217841640)  --- "de"  ---  {:tag=>"P", :lemma=>"de", :category=>:preposition}   --- [] 
      +--+ Phrase (70310218230040)  --- "vous voir"  ---  {:tag=>"VN", :category=>:verbal_nucleus}   --- [] 
         |
         +--> Word (70310218247020)  --- "vous"  ---  {:tag=>"CL", :lemma=>"vous", :category=>:pronoun}   --- [] 
         +--> Word (70310218287320)  --- "voir"  ---  {:tag=>"V", :lemma=>"voir", :category=>:verb}   --- [] 
   +--> Punctuation (70310218328120)  --- "!"  ---  {:tag=>"!", :lemma=>"!", :category=>:exclamation}   --- []
louismullie commented 12 years ago

Gem is now updated.

romeo4934 commented 12 years ago

Thank you very much !!! :-)

2012/5/17 Louis Mullie < reply@reply.github.com

Gem is now updated.


Reply to this email directly or view it on GitHub: https://github.com/louismullie/treat/issues/10#issuecomment-5772472

louismullie commented 12 years ago

Do let me know if you encounter any other bugs with the French stuff! I'll try to fix it as fast as posible.