MLDB-2176 Multilabel classification

mathieumb commented 7 years ago

The goal is to handle classification problems where each example has a set of labels instead of a single one. (for example, tagging content)

With 3 options, two trivial for comparison purpose and one-vs-all, which will train a probabilized, binary classifier for each possible label.

mathieumb commented 7 years ago

@jeremybarnes @mailletf

mathieumb commented 7 years ago

@jeremybarnes

mathieumb commented 7 years ago

@jeremybarnes ping

jeremybarnes commented 7 years ago

Lots of small-ish comments; and a couple of more serious issues. Also careful in the copying of vectors; the accuracy code is part of the basic workflow and it is a productivity sink if it's slow.

In general the approach looks good, it just needs some whipping into shape. I would also consider moving this to a couple of smaller PRs where you do the refactoring first, secondly you build the basic machinery, and thirdly you modify MLDB to incorporate it. That way it would be much clearer in terms of impact and intent.

mathieumb commented 7 years ago

Verbal +1 from @jeremybarnes

mldbai / mldb

MLDB-2176 Multilabel classification #875