Multi-Label Passive-Aggressive

Hello D.

I've started to work on the multi-label branch. I have made the following 
changes:

- Parse comma-separated list of labels.

- Add a MultiplePassOuterLoop routine: it shuffles the dataset and makes 
several passes over it. It's more intuitive to determine a number of passes and 
results can sometimes be more stable on some datasets.

- Add a MultiLabelWeightVector. It is compatible with other weight classes 
(both API-wise and file-wise). It also has a bunch of additional methods such 
as "SelectLabel".

- Add Multi-Label Passive-Aggressive. Strictly speaking, the learner optimizes 
a label ranking (relevant labels should be more ranked higher than irrelevant 
labels). On the 20 newsgroup dataset, it gives 82% accuracy (liblinear gave 
85%). (I didn't optimize the hyperparameters though).

- Add a "--prediction_type multi-label" option.

- Infer the number of dimensions from the training dataset when --dimensioality 
is set to 0.

I wanted to add one-vs-all but unfortunately, the fact that the labels are 
attached to the vectors makes it hard (or inefficient): I need to be able to 
pass +1 or -1 instead of the real label to the update function.

Possible short-term plans could include optimizing the multi-class hinge loss 
and the multinomial logistic loss by SGD.

Original issue reported on code.google.com by mblon...@gmail.com on 28 Apr 2011 at 8:38

scmyyan / sofia-ml

Multi-Label Passive-Aggressive #5