louismullie / treat

Natural language processing framework for Ruby.
Other
1.37k stars 128 forks source link

Is this the kind of output I can expect? #12

Closed bingomanatee closed 12 years ago

bingomanatee commented 12 years ago

I went through the installation and I'm trying this out ... but I'm not seeing that the output is that useful.

Here is my test run:

=> nil 1.9.3p194 :002 > "I am a sentence".parse => Phrase (28923900) --- "I am a sentence" --- {:tag=>"S", :tag_set=>:penn} --- []
1.9.3p194 :003 > "This should not be hard".parse => Phrase (31930400) --- "This should not be hard" --- {:tag=>"S", :tag_set=>:penn} --- []
1.9.3p194 :004 > "I would like to go to the movies".parse.inspect => "Phrase (31966680) --- \"I would like [...] the movies\" --- {:tag=>\"S\", :tag_set=>:penn} --- [] " 1.9.3p194 :005 > "I would like to go to the movies".tokenize => Phrase (26454480) --- "I would like [...] the movies" --- {} --- []
1.9.3p194 :006 > "This is a sentence.".tokenize => Sentence (26860740) --- "This is a sentence." --- {} --- []
1.9.3p194 :007 > sect = Section "A walk in the park\n"+ 'Obama and Sarkozy met this friday to investigate ' + 'the possibility of a new rescue plan. The French ' + 'president Sarkozy is to meet Merkel next Tuesday.'

sect.do :chunk, :segment, :tokenize, :parse1.9.3p194 :008 > 1.9.3p194 :009 > 1.9.3p194 :010 > => Section (25993960) --- "A walk in [...] next Tuesday." --- {} --- []
1.9.3p194 :011 > 1.9.3p194 :012 > p = Paragraph 'A walk in the park. A trip on a boat.' p.segmentSyntaxError: (irb):12: syntax error, unexpected '=', expecting $end ... :segment, :tokenize, :parsep = Paragraph 'A walk in the par... ... ^ from /home/dave/.rvm/rubies/ruby-1.9.3-p194/bin/irb:16:in <main>' 1.9.3p194 :013 > p = Paragraph 'A walk in the park. A trip on a boat.' NoMethodError: undefined methodsegmentp=' for nil:NilClass from (irb):13 from /home/dave/.rvm/rubies/ruby-1.9.3-p194/bin/irb:16:in `

' 1.9.3p194 :014 > s = "How do you think this works" => "How do you think this works" 1.9.3p194 :015 > s.tokenize => Phrase (25719320) --- "How do you think this works" --- {} --- []
1.9.3p194 :016 > s = Sentence "This is a sentence." => Sentence (24476840) --- "This is a sentence." --- {} --- []
1.9.3p194 :017 > s.tokenize => Sentence (24476840) --- "This is a sentence." --- {} --- []
1.9.3p194 :018 >

Is this typical output? Should I be seeing more info on this stuff?

Thanks

louismullie commented 12 years ago

That is indeed the expected output! The results you are getting in irb show the output of the inspect method, which only gives limited information about an entity. Here's how to get more info.

Given:

p = "I am a phrase".parse

s = "I am a sentence.".tokenize

sect = Section "A walk in the park\n"+
'Obama and Sarkozy met this friday to investigate ' +
'the possibility of a new rescue plan. The French ' +
'president Sarkozy is to meet Merkel next Tuesday.'

sect.do :chunk, :segment, :tokenize, :parse

Then:


# Print the tree of any entity
p.print_tree

# Iterate over all the tokens
s.each_token do |token|
  puts token.to_s
end

# Get an array of token strings
s.tokens.map { |t| t.to_s }

# Iterate over all words
sect.each_word do |w|
  puts w.inspect
end

Hope this helps. Don't hesitate if you need more info!