Closed jajandio closed 5 years ago
Apparently, your stop words file is not in UTF-8 encoding. Could you convert it (on Unixes, I would use iconv, on Windows some better text editors can do a "Save with encoding") to UTF-8 and try again?
Closed due to lacking feedback.
Hi, when I try to use a custom stopwords.txt for spanish (so it catches common words that use á, é, etc) I get the following error:
C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/nlp.rb:20:in'
split': invalid byte sequence in UTF-8 (ArgumentError) from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/nlp.rb:20:in
stopwords' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/nlp.rb:155:instopword?' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/nlp.rb:90:in
block in keywords' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/nlp.rb:90:inreject' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/nlp.rb:90:in
keywords' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/model.rb:192:inconsume_lines' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/model.rb:226:in
consume_all' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/model.rb:40:inconsume_all' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/bin/ebooks:118:in
consume_all' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/bin/ebooks:408:incommand' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/bin/ebooks:429:in
<top (required)>' from C:/Ruby24-x64/bin/ebooks:22:inload' from C:/Ruby24-x64/bin/ebooks:22:in