Hello D.
I've started to work on the multi-label branch. I have made the following
changes:
- Parse comma-separated list of labels.
- Add a MultiplePassOuterLoop routine: it shuffles the dataset and makes
several passes over it. It's more intuitive to determine a number of passes and
results can sometimes be more stable on some datasets.
- Add a MultiLabelWeightVector. It is compatible with other weight classes
(both API-wise and file-wise). It also has a bunch of additional methods such
as "SelectLabel".
- Add Multi-Label Passive-Aggressive. Strictly speaking, the learner optimizes
a label ranking (relevant labels should be more ranked higher than irrelevant
labels). On the 20 newsgroup dataset, it gives 82% accuracy (liblinear gave
85%). (I didn't optimize the hyperparameters though).
- Add a "--prediction_type multi-label" option.
- Infer the number of dimensions from the training dataset when --dimensioality
is set to 0.
I wanted to add one-vs-all but unfortunately, the fact that the labels are
attached to the vectors makes it hard (or inefficient): I need to be able to
pass +1 or -1 instead of the real label to the update function.
Possible short-term plans could include optimizing the multi-class hinge loss
and the multinomial logistic loss by SGD.
Original issue reported on code.google.com by mblon...@gmail.com on 28 Apr 2011 at 8:38
Original issue reported on code.google.com by
mblon...@gmail.com
on 28 Apr 2011 at 8:38