grindcrank / twitter_ebooks

Better twitterbots for all your friends~alive and polling
MIT License
12 stars 6 forks source link

Ranking keywords fails when using UTF-8 #6

Closed jajandio closed 5 years ago

jajandio commented 6 years ago

Hi, when I try to use a custom stopwords.txt for spanish (so it catches common words that use á, é, etc) I get the following error:

C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/nlp.rb:20:in split': invalid byte sequence in UTF-8 (ArgumentError) from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/nlp.rb:20:instopwords' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/nlp.rb:155:in stopword?' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/nlp.rb:90:inblock in keywords' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/nlp.rb:90:in reject' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/nlp.rb:90:inkeywords' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/model.rb:192:in consume_lines' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/model.rb:226:inconsume_all' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/lib/twitter_ebooks/model.rb:40:in consume_all' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/bin/ebooks:118:inconsume_all' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/bin/ebooks:408:in command' from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/twitter_ebooks-3.1.6/bin/ebooks:429:in<top (required)>' from C:/Ruby24-x64/bin/ebooks:22:in load' from C:/Ruby24-x64/bin/ebooks:22:in

'

grindcrank commented 6 years ago

Apparently, your stop words file is not in UTF-8 encoding. Could you convert it (on Unixes, I would use iconv, on Windows some better text editors can do a "Save with encoding") to UTF-8 and try again?

grindcrank commented 5 years ago

Closed due to lacking feedback.