jimlindstrom / FinModeling

Tools for Financial Modeling
101 stars 32 forks source link

rake aborted! marshal data too short #9

Open hinagiku opened 9 years ago

hinagiku commented 9 years ago

Hi, after i installed the gem, i got the 'marshal data too short' error when i ran any rake tasks. The message below

rake aborted! marshal data too short /home/cl/.rvm/gems/ruby-2.0.0-p353@sasac_wan/gems/activesupport-4.0.2/lib/active_support/core_ext/marshal.rb:6:in load' /home/cl/.rvm/gems/ruby-2.0.0-p353@sasac_wan/gems/activesupport-4.0.2/lib/active_support/core_ext/marshal.rb:6:inload_with_autoloading' /home/cl/.rvm/gems/ruby-2.0.0-p353@sasac_wan/gems/naive_bayes-0.0.3/lib/naive_bayes.rb:26:in load' /home/cl/.rvm/gems/ruby-2.0.0-p353@sasac_wan/gems/finmodeling-0.2.1/lib/finmodeling/has_string_classifer.rb:22:inblock in _load_vectors_and_train' /home/cl/.rvm/gems/ruby-2.0.0-p353@sasac_wan/gems/finmodeling-0.2.1/lib/finmodeling/has_string_classifer.rb:18:in `each'

The code at naive_bayes.rb:26 is Marshal.load(data), the date is a string read from /home/cl/.finmodeling/classifiers/ai_oa.db but this file is empty. What can i do to fix the problem?(version:0.2.1 ubuntu14.04 ruby2.0.0)

jimlindstrom commented 9 years ago

Hi Liang - I'm pretty swamped by other obligations right now and am not sure I'm going to able to fix this for you quickly. But if you're motivated, I can potentially help you fix it, and would definitely accept any pull requests.

Here's my initial take:

This gem uses Bayesian classifiers to classify financial statement items (e.g., as financial assets vs. operating assets, on the balance sheet). Bayesian classifiers require training. I forget whether I pre-trained the classifiers and commited the parameters to the repo, but I kind of doubt it. My guess is there's a rakefile with a task that lets you run the trainer. And probably I have the code saving those results to somewhere in your home directory. I'm guessing the code you're running blindly assumes that training data exists and is trying to read it in.

... Let me know if that helps, and if you can figure out any way of improving the codebase so others don't run into this in the future. I'd love to merge in any PRs you can come up with to help address this.