louismullie / treat

Natural language processing framework for Ruby.
Other
1.37k stars 128 forks source link

Impossible to install French package #124

Open psychoslave opened 7 years ago

psychoslave commented 7 years ago

Hi, I'm running Fedora 25, and I'm trying to use treat. The gem itself was installed seamlessly, and I then wanted to install the French package (gem install treat, with gem 2.5.1).

A first error I had was about stanford-core-nlp which wasn't buildable because JAVA_HOME wasn't set. A simple export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk resolved the problem.

However I meet a more important problem with the download of models for the Punkt segmenter for the French language. Here is a sample of what I did and its result:

▶ ruby --version
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]
▶ irb --version
irb 0.9.6(09/06/30)
▶ irb 
irb(main):001:0> require 'treat'
=> true
irb(main):002:0>  Treat::Core::Installer.install 'french'

Treat Installer, v. 2.1.0

1. Installing core dependencies.

Installing nokogiri...
Building native extensions.  This could take a while...
WARN: Unresolved specs during Gem::Specification.reset:
      json (~> 1.8)
WARN: Clearing out unresolved specs.
Please report a bug if this causes problems.
Installing ferret...
Building native extensions.  This could take a while...
Installing bson_ext...
Building native extensions.  This could take a while...
Installing mongo...
Installing lda-ruby...
Building native extensions.  This could take a while...
Installing stanford-core-nlp...
Building native extensions.  This could take a while...
Fetching: bind-it-0.2.7.gem (100%)
Fetching: stanford-core-nlp-0.5.3.gem (100%)
Installing linguistics...
This library also presents tie-ins for the 'linkparser' and
'wordnet' libraries, which you can enable by installing the
gems of the same name.
Installing ruby-readability...
Installing whatlanguage...
Installing chronic...
Installing kronic...
Installing nickel...
Installing decisiontree...
Installing rb-libsvm...
Building native extensions.  This could take a while...
Installing ruby-fann...
Building native extensions.  This could take a while...
Installing zip...
Installing loggability...
Installing tf-idf-similarity...
Installing narray...
Building native extensions.  This could take a while...
Installing fastimage...
Installing fuzzy-string-match...
Installing levenshtein-ffi...
Building native extensions.  This could take a while...

2. Installing dependencies for the French language.

Installing punkt-segmenter...
Installing tactful_tokenizer...
Installing stanford-core-nlp...

3. Downloading models for the Punkt segmenter for the French language.

RuntimeError: Couldn't download http://www.louismullie.com/treat/punkt/french.yaml (Max number of attempts reached). Error: (Couldn't download https://coreslicer.com/treat/punkt/french.yaml (Max number of attempts reached). Error: (Response code was not 200 , but 404.))
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/schiphol-1.0.2/lib/schiphol.rb:138:in `rescue in download'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/schiphol-1.0.2/lib/schiphol.rb:147:in `download'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/treat-2.1.0/lib/treat/core/installer.rb:149:in `download_punkt_models'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/treat-2.1.0/lib/treat/core/installer.rb:55:in `install'
        from (irb):2
        from /home/psychoslave/.rbenv/versions/2.3.1/bin/irb:11:in `<main>'

Is my gem version out of sync with the repository structure it's trying to fetch from? Does it result from a deprecation decision? Should I move to a newer version of treat through the github repository?

psychoslave commented 7 years ago

Actually the same happen for English:

6. Downloading models for the Punkt segmenter for the English language.

RuntimeError: Couldn't download http://www.louismullie.com/treat/punkt/english.yaml (Max number of attempts reached). Error: (Couldn't download https://coreslicer.com/treat/punkt/english.yaml (Max number of attempts reached). Error: (Response code was not 200 , but 404.))
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/schiphol-1.0.2/lib/schiphol.rb:138:in `rescue in download'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/schiphol-1.0.2/lib/schiphol.rb:147:in `download'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/treat-2.1.0/lib/treat/core/installer.rb:149:in `download_punkt_models'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/treat-2.1.0/lib/treat/core/installer.rb:55:in `install'
        from (irb):3
        from /home/psychoslave/.rbenv/versions/2.3.1/bin/irb:11:in `<main>'
psychoslave commented 7 years ago

I also tried with the git repository version:

▶ bundle install
Your Gemfile lists the gem rspec (>= 0) more than once.
You should probably keep only one of them.
While it's not a problem now, it could cause errors if you change the version of one of them later.
Your Gemfile lists the gem rake (>= 0) more than once.
You should probably keep only one of them.
While it's not a problem now, it could cause errors if you change the version of one of them later.
Your Gemfile lists the gem simplecov (>= 0) more than once.
You should probably keep only one of them.
While it's not a problem now, it could cause errors if you change the version of one of them later.
Fetching gem metadata from https://rubygems.org/..........
Fetching version metadata from https://rubygems.org/.
Resolving dependencies...
Using rake 12.0.0
Using birch 0.1.1
Using diff-lcs 1.3
Using docile 1.1.5
Using guess_html_encoding 0.0.11
Using json 1.8.6
Using mime-types 1.25.1
Using mini_portile2 2.2.0
Using progressbar 1.8.2
Using rspec-support 3.6.0
Using rubyzip 0.9.9
Using simplecov-html 0.10.1
Installing unicode-display_width 1.3.0
Using bundler 1.14.6
Using yomu 0.2.4
Using nokogiri 1.8.0
Using rspec-core 3.6.0
Using rspec-expectations 3.6.0
Using rspec-mocks 3.6.0
Using schiphol 1.0.2
Using simplecov 0.14.1
Installing terminal-table 1.8.0
Using ruby-readability 0.7.0
Installing rspec 3.6.0
Using treat 2.1.0 from source at `.`
Bundle complete! 13 Gemfile dependencies, 25 gems now installed.
Use `bundle show [gemname]` to see where a bundled gem is installed.

▶ irb  
irb(main):001:0> require 'treat'
=> true
irb(main):002:0> Treat::Core::Installer.install 'english'

Treat Installer, v. 2.1.0

1. Installing core dependencies.

Installing nokogiri...
Building native extensions.  This could take a while...
WARN: Unresolved specs during Gem::Specification.reset:
      json (~> 1.8)
WARN: Clearing out unresolved specs.
Please report a bug if this causes problems.
Installing ferret...
Building native extensions.  This could take a while...
Installing bson_ext...
Building native extensions.  This could take a while...
Installing mongo...
Installing lda-ruby...
Building native extensions.  This could take a while...
Installing stanford-core-nlp...
Installing linguistics...
This library also presents tie-ins for the 'linkparser' and
'wordnet' libraries, which you can enable by installing the
gems of the same name.
Installing ruby-readability...
Installing whatlanguage...
Installing chronic...
Installing kronic...
Installing nickel...
Installing decisiontree...
Installing rb-libsvm...
Building native extensions.  This could take a while...
Installing ruby-fann...
Building native extensions.  This could take a while...
Installing zip...
Installing loggability...
Installing tf-idf-similarity...
Installing narray...
Building native extensions.  This could take a while...
Installing fastimage...
Installing fuzzy-string-match...
Installing levenshtein-ffi...
Building native extensions.  This could take a while...

2. Installing dependencies for the English language.

Installing rbtagger...
Building native extensions.  This could take a while...
Installing ruby-stemmer...
Building native extensions.  This could take a while...
Installing punkt-segmenter...
Installing tactful_tokenizer...
Installing nickel...
Installing rwordnet...
Installing uea-stemmer...
Installing engtagger...
Installing activesupport...
Installing srx-english...
Installing scalpel...

3. Downloading models for the Punkt segmenter for the English language.

RuntimeError: Couldn't download http://www.louismullie.com/treat/punkt/english.yaml (Max number of attempts reached). Error: (Couldn't download https://coreslicer.com/treat/punkt/english.yaml (Max number of attempts reached). Error: (Response code was not 200 , but 404.))
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/schiphol-1.0.2/lib/schiphol.rb:138:in `rescue in download'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/schiphol-1.0.2/lib/schiphol.rb:147:in `download'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/treat-2.1.0/lib/treat/core/installer.rb:149:in `download_punkt_models'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/treat-2.1.0/lib/treat/core/installer.rb:55:in `install'
        from (irb):2
        from /home/psychoslave/.rbenv/versions/2.3.1/bin/irb:11:in `<main>'
psychoslave commented 7 years ago

It seems my previous install actually didn't used the git repository version, and indeed server changed in the repository, however I'm astonished gem install would not install such an old change:

▶ git blame lib/treat/core/installer.rb | grep 'Server ='
f1f8c010 lib/treat/core/installer.rb    (Andrew Brown 2016-05-24 13:19:51 -0500   8)   Server = 's3.amazonaws.com/static-public-assets'
psychoslave commented 7 years ago

So now here is how to actually use the repository version

# gem install wordnet # suggested by install log
gem build treat.gemspec
gem install treat-2.1.0.gem 

And then in irb, the following will work:

Treat::Core::Installer.install

But any attempt to install a French package will fail, because there is indeed no french.yaml accessible:

Treat::Core::Installer.install 'french'
psychoslave commented 7 years ago

see https://github.com/louismullie/treat/issues/115

psychoslave commented 7 years ago

So the French package isn't available on the server, but it should be possible to bypass the problem by directly the copying the relevant file to the subdirectory ./models/punkt/ where the gem is installed. In my case it's in ~/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/treat-2.1.0.

You then need a french.yaml. It looks like the "Punt"s file are based on ".pickle" files, as used in NLTK for example. I need more investigation to find this files, but here are there JSON equivalent: https://github.com/harrisj/punkt/commit/7c64ff034faef43d58665326924279cc55c5138b#diff-4bfc17cd24c1afdec0c3ea5f6513a402

rushilagr commented 7 years ago

Hey @psychoslave, could you finally make it work?