lynaghk / clj-liblinear

A Clojure wrapper for LIBLINEAR, a linear support vector machine library
http://keminglabs.com/clj-liblinear/
Eclipse Public License 1.0
28 stars 9 forks source link

Fix for bias, cross-fold option, unseen feature support. #4

Closed akhudek closed 11 years ago

akhudek commented 11 years ago

Hey Kevin,

Any interest in these changes? The unseen feature support reserves 1 for unseen features. During prediction, any new features get assigned this id. The cross-fold is fine, though I haven't added any code to compute accuracy values as my own code is only for two classes (e.g. not sufficiently general).

The bias support was missing the adjustment to n when the bias is active. You might also want to consider changing the default for the bias to 1 even though it differs from liblinear's defaults. It is apparently rare to need no bias in practice and disabling it might harm accuracy.

lynaghk commented 11 years ago

Hi @akhudek; a few questions:

akhudek commented 11 years ago

You're right, it's better just to filter unseen data in this case. Too much "word vector representation" on the brain where they have a vector for unseen words. I'll change this.

For bias, it turns out you also have to manually add a bias feature to every instance with index (inc (count diminsions)). Bias wasn't working at all, though I've verified that it now does.

On Thursday, 23 May, 2013 at 1:54 PM, Kevin Lynagh wrote:

Hi @akhudek (https://github.com/akhudek); a few questions: What's the benefit of keeping an extra dimension around for model evaluation time? If you're going to classify or regress with a given model, you should clean your data to work with the assumptions of that model, no?
Re: bias, does liblinear require that you set the dimension to d+1 when there is a bias? What happens when you don't? Damn...

— Reply to this email directly or view it on GitHub (https://github.com/lynaghk/clj-liblinear/pull/4#issuecomment-18360310).

akhudek commented 11 years ago

Ok, I've reverted having feature-nodes just discard features not in feature maps (sets already worked this way). I've also updated the predict function to add bias features to instances.

Finally, I've added a simple accuracy output for crossfold that is identical to what liblinear's command line code returns. The target array is still returned for implementing problem specific accuracy measures.

akhudek commented 11 years ago

Hey Kevin, I understand you're probably busy, is there anything I can help with in regard to these changes? Would be wonderful to have them in the main lib.

lynaghk commented 11 years ago

Pinging me was all the help I needed---I just lost track of this pull request, sorry about that = ) Tested and merged now. I'll push a 0.1.0 release to Clojars too.