mispy-archive / twitter_ebooks

Better twitterbots for all your friends~
MIT License
972 stars 140 forks source link

occasional `random seed` error consuming archive #10

Open mattieb opened 10 years ago

mattieb commented 10 years ago

I can't figure out what causes this, but it doesn't happen all the time. A few days ago, it happened, but then on my next archive/consume cycle it didn't.

Just upgraded to 2.2.2 etc.

ebooks:19$ bin/ebooks consume corpus/zigg.json 
Faraday::Builder is now Faraday::RackBuilder.
Reading json corpus from corpus/zigg.json
Removing commented lines and sorting mentions
Segmenting text into sentences
Tokenizing 7411 statements and 9233 mentions
Ranking keywords
/home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/bloomfilter-rb-2.1.1/lib/bloomfilter/native.rb:20:in `new': random seed (ArgumentError)
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/bloomfilter-rb-2.1.1/lib/bloomfilter/native.rb:20:in `initialize'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/highscore-1.2.0/lib/highscore/wordlist.rb:111:in `new'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/highscore-1.2.0/lib/highscore/wordlist.rb:111:in `init_bloom_filter'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/highscore-1.2.0/lib/highscore/wordlist.rb:41:in `initialize'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/highscore-1.2.0/lib/highscore/wordlist.rb:31:in `new'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/highscore-1.2.0/lib/highscore/wordlist.rb:31:in `load'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/highscore-1.2.0/lib/highscore/wordlist.rb:13:in `load_file'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/highscore-1.2.0/lib/highscore/blacklist.rb:15:in `load_default_file'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/highscore-1.2.0/lib/highscore/content.rb:17:in `initialize'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/twitter_ebooks-2.2.2/lib/twitter_ebooks/nlp.rb:76:in `new'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/twitter_ebooks-2.2.2/lib/twitter_ebooks/nlp.rb:76:in `keywords'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/twitter_ebooks-2.2.2/lib/twitter_ebooks/model.rb:73:in `consume'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/twitter_ebooks-2.2.2/lib/twitter_ebooks/model.rb:13:in `consume'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/twitter_ebooks-2.2.2/bin/ebooks:58:in `block in consume'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/twitter_ebooks-2.2.2/bin/ebooks:53:in `each'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/twitter_ebooks-2.2.2/bin/ebooks:53:in `consume'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/twitter_ebooks-2.2.2/bin/ebooks:194:in `command'
        from /home/ebooks/zigg_ebooks/vendor/bundle/ruby/1.9.1/gems/twitter_ebooks-2.2.2/bin/ebooks:204:in `<top (required)>'
        from bin/ebooks:16:in `load'
        from bin/ebooks:16:in `<main>'

Because I like being weird, I'm running this in a dedicated user account on OpenBSD. :smile:

mattieb commented 10 years ago

This may be bloomfilter's problem. I changed a bit of code in the beginning of native.rb thus, limiting the seed to 16-bit:

module BloomFilter
  class Native < Filter
    attr_reader :bf

    def initialize(opts = {})
      @opts = {
        :size    => 100,
        :hashes  => 4,
        :seed    => Time.now.to_i % 65536,
        :bucket  => 3,
        :raise   => false
      }.merge(opts)

and I can now consume again.