louismullie / treat

Natural language processing framework for Ruby.
Other
1.37k stars 128 forks source link

Rails #2

Closed bobbytables closed 12 years ago

bobbytables commented 12 years ago

Do you have any documentation to get this into a rails app? Thanks!

bobbytables commented 12 years ago

More specifically:

irb(main):001:0> Paragraph
NameError: uninitialized constant Paragraph
    from (irb):1
    from /Users/robert/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/railties-3.2.3/lib/rails/commands/console.rb:47:in `start'
    from /Users/robert/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/railties-3.2.3/lib/rails/commands/console.rb:8:in `start'
    from /Users/robert/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/railties-3.2.3/lib/rails/commands.rb:41:in `<top (required)>'
    from script/rails:6:in `require'
    from script/rails:6:in `<main>'
irb(main):002:0> Treat.sweeten!
=> nil
irb(main):003:0> Paragraph
NameError: uninitialized constant Paragraph
louismullie commented 12 years ago

Hey Robert,

Paragraph is defined as a method on Object, so it does not exist as a constant. It is actually an alias for Treat::Entities::Paragraph.build(), which expects at least one parameter. As an example, you can try:

require 'treat'
par = Paragraph "A test paragraph. Another sentence."
par.do :segment, :tokenize
par.print_tree

Let me know if this works for you.

As for the Rails integration, there is no plugin currently. If you can provide a little more detail concerning what you want to do exactly, I'll be happy to help!

bobbytables commented 12 years ago

I'm mostly interested in extracting sentences from paragraphs currently.

Maybe you could answer a primarily noobish question:

If my stanford-nlp is at /usr/local/stanford-parser/ what would I configure to use that?

louismullie commented 12 years ago

Note that you won't be able to use just the Stanford Parser files; the bindings require the full stanford-core-nlp package.

The easiest way to get all the necessary files without any configuration is to run rake treat:install (if you cloned the repo), or (if you installed as a gem):

require 'treat'
Treat.install

The installer will check if the stanford-core-nlp gem is installed and download the necessary files into the right folders.

If you want a more custom install, you can download one of the following:

Place all JAR files inside /path_to_treat/bin/stanford and all the other folders inside the package in /path_to_treat/models/stanford.

In terms of pointing to other folders than these default ones for the stanford-files, there really aren't any options right now (see the loader). I'll put that on my TODO list!

louismullie commented 12 years ago

Concerning your goal (extraction of sentences from paragraphs), here are a few hints on how to do decision tree classification of sentences based on their features:

  1. Create/load your paragraphs.
  2. Split them into sentences by calling segment on them.
  3. Annotate them with the necessary features to train a classifier.
  4. Create a Treat::Classification object to describe your classification task.
  5. Retrieve a data-set from your paragraphs by calling paragraph.export(classification).
  6. [Save your data-set by calling data_set.save('file.yml').]
  7. Classify sentences in new/unobserved paragraphs by calling:
    sentence.classify :id3, :training => data_set
bobbytables commented 12 years ago

If you would like a patch for loading external files via a config. I can help you with that. It seems odd to store them in your gems directory, since for people using bundler, the folder will change, losing all changes.

louismullie commented 12 years ago

I realize that it might be kind of odd, I started working on a patch. Should commit it by the end of the day.

Cheers, Louis

louismullie commented 12 years ago

Hey Robert,

Sorry it took a been longer than expected, just pushed the 1.0.3, which adds support for loading external files for Stanford.

require 'treat'

require 'treat/loaders/stanford'
Treat::Loaders::Stanford.jar_path = '/usr/local/bin/'
Treat::Loaders::Stanford.model_path = '/usr/local/models/'

"A phrase to tokenize".tokenize(:stanford).print_tree

Cheers, Louis

louismullie commented 12 years ago

Well, think I went a bit too fast on that one - still a few bugs needed to be fixed. Just bumped to 1.0.4, everything functions properly now.

bobbytables commented 12 years ago

Haha, this looks great. I'll report my experience and submit patches when I see em