louismullie / treat

Natural language processing framework for Ruby.
Other
1.36k stars 128 forks source link

Stanford parser: wrong results for german, no error message #52

Open bklippstein opened 11 years ago

bklippstein commented 11 years ago

The stanford parser gives wrong results for german but no error message:

  # ruby encoding: utf-8
  ENV['JAVA_HOME'] = "/opt/java"
  ENV['LD_LIBRARY_PATH'] = "/opt/java/jre/lib/amd64"
  require 'treat'
  include Treat::Core::DSL
  Treat.core.language.default = 'german' 
  Treat.core.verbosity.debug

  s = sentence 'Der wilde Kerl lebte in einem gelben Haus.'
  s.do(:tokenize, :parse)
  s.print_tree

Output:

+ Sentence (15019540)  --- "(ART Der) lebte (APPR in)."  ---  {:tag_set=>:stutgart}   --- [] 
|
+--> Symbol (14993200)  --- "(ART Der)"  ---  {:tag=>"NP-SB"}   --- [] 
+--> Word (15125440)  --- "lebte"  ---  {:tag=>"VVFIN"}   --- [] 
+--> Symbol (13203700)  --- "(APPR in)"  ---  {:tag=>"PP"}   --- [] 
+--> Punctuation (14970040)  --- "."  ---  {:tag=>"$."}   --- [] 

==> The first word is no Symbol, and there are several words missing.

English is working fine. Using the stanford-core-nlp gem everything looks good:

  # ruby encoding: utf-8
  ENV['JAVA_HOME'] = "/opt/java"
  ENV['LD_LIBRARY_PATH'] = "/opt/java/jre/lib/amd64"
  require 'stanford-core-nlp'
  StanfordCoreNLP.use :german
  pipeline =  StanfordCoreNLP.load(:tokenize, :ssplit, :parse)
  text = StanfordCoreNLP::Annotation.new('Der wilde Kerl lebte in einem gelben Haus.')
  pipeline.annotate(text)

  text.get(:sentences).each do |sentence|
    puts sentence.get(:basic_dependencies).to_s
    sentence.get(:tokens).each do |token|
      puts "#{token.get(:original_text)} #{token.get(:part_of_speech)}"
    end
  end

Output:

Der ART
wilde ADJA
Kerl NN
lebte VVFIN
in APPR
einem ART
gelben ADJA
Haus NN
. $.

If I try to use the stanford-core-nlp gem after Treats do-method, I get an error with bind-it:

# ruby encoding: utf-8
ENV['JAVA_HOME'] = "/opt/java"
ENV['LD_LIBRARY_PATH'] = "/opt/java/jre/lib/amd64"
require 'treat'
include Treat::Core::DSL
Treat.core.language.default = 'german' 
Treat.core.verbosity.debug

s = sentence 'Der wilde Kerl lebte in einem gelben Haus.'
s.do(:tokenize, :parse)

require 'stanford-core-nlp'
StanfordCoreNLP.use :german
pipeline =  StanfordCoreNLP.load(:tokenize, :ssplit, :parse)
text = StanfordCoreNLP::Annotation.new('Der wilde Kerl lebte in einem gelben Haus.')
pipeline.annotate(text)

Error:

/home/klippstein/.rvm/gems/ruby-1.9.3-p392/gems/bind-it-0.2.7/lib/bind-it/rjb_proxy.rb:37:in `method_missing': unknown exception (NullPointerException)
  from /home/klippstein/.rvm/gems/ruby-1.9.3-p392/gems/bind-it-0.2.7/lib/bind-it/rjb_proxy.rb:37:in `method_missing'
  from /home/klippstein/Dropbox/Ruby-AptanaWorkspace/spielwiese/treat.rb:28:in `<main>'
louismullie commented 11 years ago

I'll look into it ASAP. Thanks for the report.

louismullie commented 11 years ago

Looks like this is a problem with the parser wrapper, since tokenizing and tagging separately with Stanford works.

nightscape commented 10 years ago

I just ran into this as well and I can also reproduce it on the most recent git version. Is there a workaround? Thanks!

AndrzejJantos commented 9 years ago

@louismullie When loading StanfordCoreNLP.load(:tokenize, :ssplit, :parse, :ner) The :ner makes this: /gems/bind-it-0.2.7/lib/bind-it/rjb_proxy.rb:37:in method_missing': Fail: unknown method namebacktrace' (RuntimeError) gems/bind-it-0.2.7/lib/bind-it/rjb_proxy.rb:37:in `method_missing'

Please help with that ASAP?