louismullie / treat

Natural language processing framework for Ruby.
Other
1.37k stars 128 forks source link

Word humongous #45

Closed jou959 closed 11 years ago

jou959 commented 11 years ago

Every time I try to input a string text containing the word 'humongous' I get the following exception below:

/gems/treat-2.0.4/lib/treat/entities/entity/buildable.rb:24 8:in `from_serialized_file': Path 'humongous (Treat::Exception) ' does not point to a readable file.

Tested with:

"In 1991, the 66 year old money manager hit a bonanza,racking up a humongous 88.8% total return more than twice the return of the average growth fund and nearly three times the SP 500-stock index."

"humongous amounts of money"

What could be the error and how would I be able to resolve this? Thanks.

louismullie commented 11 years ago

When you use the document keyword, Treat interprets the input as a file name. In your case you should probably use paragraph instead.

louismullie commented 11 years ago

Nevermind, I am able to reproduce the bug. Will look into this quickly.

louismullie commented 11 years ago

The culprit is this:

  # Build a document from a raw or serialized file.
  def from_file(file,def_fmt=nil)

    if file.index('yml') ||
      file.index('yaml') ||
      file.index('xml') ||
      file.index('mongo') <-------------------
      from_serialized_file(file)
    else
      fmt = Treat::Workers::Formatters::
      Readers::Autoselect.detect_format(file,def_fmt)
      from_raw_file(file, fmt)
    end

  end

For now just remove the problematic line and you'll be fine. I'll submit a patch tomorrow.

jou959 commented 11 years ago

Thanks, I'll remove that line for now

louismullie commented 11 years ago

Fixed in the next release

robertjung commented 11 years ago

Same problem in def build(*args):

  elsif self == Treat::Entities::Document ||
    (fv.index('yml') || fv.index('yaml') ||
    fv.index('xml') || fv.index('mongo'))
robertjung commented 11 years ago

Still breaks when text contains any of yml, yaml or xml. Slightly better would be (fv.index(/\.yml\Z/i)) || …

Maybe you want to pass an option hash like build("some string that might be a .yml or a real string", type: :file)?