louismullie / treat

Natural language processing framework for Ruby.
Other
1.36k stars 128 forks source link

IllegalArgumentException #33

Closed LeFnord closed 11 years ago

LeFnord commented 11 years ago

Hi Louis,

I want to parse a german document, that's why I do:

Treat.core.language.detect = true

and

doc = Treat::Entities::Paragraph.build german_text text.do :segment, :parse, :category

and here comes the message:

IllegalArgumentException Unknown option: -retainTmpSubcategories

Another thing, can you publish your future plans for treat, may some more people could help?

louismullie commented 11 years ago

Can you monkey patch this function similarly for german and tell me if it works?

LeFnord commented 11 years ago

I try it ;)

Am 26.11.2012 um 17:09 schrieb Louis Mullie notifications@github.com:

Can you monkey patch this function

— Reply to this email directly or view it on GitHub.


Remember, a great way to avoid broken code is to have less of it. The code that you never write will work forever.

Russ Olsen, “Eloquent Ruby”

louismullie commented 11 years ago

Re: future plans, I'm going to release v 2.0.0 shortly and we'll have a better roadmap then. You can also check out this page for ideas on contributing.

louismullie commented 11 years ago

@LeFnord I updated the stanford-core-nlp gem to fix the above error and German parsing now works well!

Treat.core.language.detect = true
s = sentence "Du hast deiner Frau einen roten Ring gekauft."
s.apply(:parse).print_tree

Output:

+ Sentence (70220136461780)  --- "Du hast deiner [...] Ring gekauft."  ---  {:language=>:german, :tag=>"S", :tag_set=>:stutgart}   --- [] 
|
+--+ Word (70220132691200)  --- "Du"  ---  {:tag=>"NP", :tag_opt=>"SB"}   --- [] 
   |
   +--> Word (70220132512920)  --- "Du"  ---  {:tag=>"PPER", :lemma=>"du"}   --- [] 
+--> Word (70220132091400)  --- "hast"  ---  {:tag=>"VAFIN", :lemma=>"hast"}   --- [] 
+--+ Word (70220131828340)  --- "deiner Frau"  ---  {:tag=>"NP", :tag_opt=>"SB"}   --- [] 
   |
   +--> Word (70220131724040)  --- "deiner"  ---  {:tag=>"ADJA", :lemma=>"deiner"}   --- [] 
   +--> Word (70220124038360)  --- "Frau"  ---  {:tag=>"NN", :lemma=>"frau"}   --- [] 
+--+ Word (70220121286220)  --- "einen roten Ring gekauft"  ---  {:tag=>"VP"}   --- [] 
   |
   +--+ Word (70220121051020)  --- "einen roten Ring"  ---  {:tag=>"NP", :tag_opt=>"OA"}   --- [] 
      |
      +--> Word (70220120849360)  --- "einen"  ---  {:tag=>"ART", :lemma=>"einen"}   --- [] 
      +--> Word (70220120283060)  --- "roten"  ---  {:tag=>"ADJA", :lemma=>"roten"}   --- [] 
      +--> Word (70220119866060)  --- "Ring"  ---  {:tag=>"NN", :lemma=>"ring"}   --- [] 
   +--> Word (70220119551580)  --- "gekauft"  ---  {:tag=>"VVPP", :lemma=>"gekauft"}   --- [] 
+--> Punctuation (70220119128160)  --- "."  ---  {:tag=>"$.", :lemma=>"."}   --- []

Also, I released 2.0.0rc1 and I put up a top 10 priority list for additions to the library here. Let me know if there's anything that might interest you in there! I'll be setting up some milestones for core-related stuff shortly.