mlr-org / mlr

Machine Learning in R
https://mlr.mlr-org.com
Other
1.64k stars 405 forks source link

LiblineaRLogReg / weights #123

Closed berndbischl closed 9 years ago

berndbischl commented 10 years ago

Hi Eric,

why do you do this:

-  LiblineaR(data = d$data, labels = d$target, ...)
 +  if (!is.null(.weights)) {
 +    .weights = .weights[unique(names(.weights))]
 +  }
 +  LiblineaR(data = d$data, labels = d$target, wi = .weights, ...)
  }

The part:

= .weights[unique(names(.weights))]

studerus commented 10 years ago

Liblinear requires a named vector with the weights for each class. It seems that each class should occur only once in the vector. When I tried to pass the full unchanged .weights vector to LiblineaR, LiblineaR throwed an error.

berndbischl commented 10 years ago

Ok then you are using the mechanism in a wrong way. I will explain and fix

berndbischl commented 10 years ago

Explanation:

The "weights" argument in mlr::train and mlr::trainLearner is for OBSERVATION weights. The property "weights" says whether a learner supports such. This is documented here:

https://github.com/berndbischl/mlr/blob/master/R/trainLearner.R

#' @param weights [\code{numeric}]\cr
#'   Optional, non-negative case weight vector to be used during fitting.
#'   If given, must be of same length as \code{subset} and in corresponding order.
#'   By default \code{NULL} which means no weights are used unless specified in the task ([\code{\link{SupervisedTask}}]).
#'   Weights from the task will be overwritten.

But LiblineaR does not support those. So we cannot set the property or use that arg in trainLearner!

berndbischl commented 10 years ago

What you want is weighting classes. Probably because you have either non standard costs on the classes or an imbalanced problem. (I would be curious which one you have....)

Obviously there is now a connection between class weights and case weights: IF we can use the latter, we have a mechanism for the former: Simply set each observation of a certain class i in training to a weight c_i. This is actually now supported out-of-the-box by mlr:

https://github.com/berndbischl/mlr/blob/master/R/WeightedClassesWrapper.R

berndbischl commented 10 years ago

But unfortunately we already saw we can't do that for for LiblineaR.

So we would either need to set the "wi" hyperparam manually in the learner. This should have been possible before.

Or we would like to tune such a param. If you want the tuning I (just by chance!) improved mlr a bit here, because I need the same in a current project about imbalanced classed.

studerus commented 10 years ago

Sorry, I'm still very confused. My goal is to use the liblineaR learner for a classification task with unbalanced classes. According to the help file of the liblineaR function the wi argument can be used for this. It says:

wi = a named vector of weights for the different classes, used for asymetric class sizes. Not all factor levels have to be supplied (default weight: 1). All components have to be named according to the corresponding class label.

I thought that I had to use the makeWeightedClassesWrapper for treating unbalanced classes but I cannot use it with the liblineaR function as the weights property is not set. I also tried to set the wi argument manually in makeLearner but I receive an error message saying that it fits to several arguments. So, how am I supposed to use liblineaR with unbalanced classes?

The changes that I made worked for me. I could make a weightedClassesWrapper around liblineaR and the balance between true positive and true negative rates improved.

berndbischl commented 10 years ago

My goal is to use the liblineaR learner for a classification task with unbalanced classes

mlr now supports quite a lot for this

Do you have skype or hangout? I am currently running a large benchmark on this.

berndbischl commented 10 years ago

Let me try to explain again here:

The "weights" in train, the property "weights" refer to observation weights. One individual weight per observation. The makeWeightedClassesWrapper exploits this and sets weights according to the label of the class of the observation in training. Does not work fpr LibLineaR, as you cannot set individual weights for observations.

berndbischl commented 10 years ago

But you can simply use the "wi" parameter of LibLineaR, that is what you want.

task = sonar.task
lrn = makeLearner("classif.LiblineaRLogReg")
res = holdout(lrn, task)
print(getConfMatrix(res$pred))

lrn = makeLearner("classif.LiblineaRLogReg", wi = c(M = 1000, R = 1))
res = holdout(lrn, task)
print(getConfMatrix(res$pred))

lrn = makeLearner("classif.LiblineaRLogReg", wi = c(M = 1, R = 10000))
res = holdout(lrn, task)
print(getConfMatrix(res$pred))
[Resample] holdout iter: 1
[Resample] Result: mmce.test.mean= 0.3
       predicted
true     M  R -SUM-
  M     28 10    10
  R     11 21    11
  -SUM- 11 10    21
[Resample] holdout iter: 1
[Resample] Result: mmce.test.mean=0.386
       predicted
true     M R -SUM-
  M     43 0     0
  R     27 0    27
  -SUM- 27 0    27
[Resample] holdout iter: 1
[Resample] Result: mmce.test.mean=0.529
       predicted
true    M  R -SUM-
  M     0 37    37
  R     0 33     0
  -SUM- 0 37    37
berndbischl commented 10 years ago

I am currently improving some things to make tuning for such "named" vector parameters better, will be finished soon. And mlr now supports many other possibilities for imbalanced classes as well.

studerus commented 10 years ago

I know that mlr supports other possibilities for imbalanced classes, such as SMOTE. I tried both the SMOTE and weighted classes approach with liblineaR. They had about the same performance in terms of discrimination and balance between tpr and tnr but the weighted classes approach is computationally more efficient. That's why I'd prefer to use weighted classes.

berndbischl commented 10 years ago

Sure, I completely understand.

Does it help what I posted? Is there still an open question?

studerus commented 10 years ago

It works now for me to set wi directly in makeLearner (don't know why it didn't before). Still, I (and possibly others too will) find it a bit confusing that one cannot apply makeWeightedClassesWrapper to the liblineaR learner when the goal is to treat class imbalance and liblineaR has support for class weights. Btw, they same seems to apply to ksvm.

berndbischl commented 10 years ago

Still, I (and possibly others too will) find it a bit confusing that one cannot apply makeWeightedClassesWrapper to the liblineaR learner when the goal is to treat class imbalance and liblineaR has support for class weights

Again, the WeightedClassWrapper is NOT for USING class weights. It is there for cases where OBSERVATION weights are available, but class weights NOT. It gives you a chance to CREATE class weights when they are not there!

What do you propose to improve the situation?

Document this better?

Maybe this: I could extend the WeightedClassWrapper it dispatches to a hyperparemeter (wi in this case)?

berndbischl commented 10 years ago

So:

makeWeightedClassesWrapper("classif.LiblinearLogReg", param = "wi", class.names = c("M", "R")

?

berndbischl commented 10 years ago

The thing is:

This does exactly zero. It just passes the weights down and creates a slight overhead....

And makes the interface more complicated.

berndbischl commented 10 years ago

I guess this are obvious correct extensions.

1) a) Create a property "class.weights" that says whether a learner supports class.weights. Then you can search for those with list.learners. b) Allow to ask the learner how that hyper.par is called.

2) Improve the docs page of WeightedClassesWrapper. Show in an example the two alternatives we discussed.

studerus commented 10 years ago

1) I think that's a good idea, because I already wanted to use listLearners in that way before. 2) Why is it so important that makeWeightedClassWrapper is only used for cases where observation weights are available and class weights not? Why can't we use it for any case that allows class weights regardless of the mechanism?

studerus commented 10 years ago

Using makeWeightedClassesWrapper for any case would be more convenient because then I wouldn't have look up the argument of the specific learner and class names every time I want to weight the classes.

berndbischl commented 10 years ago

Eric, would you be willing to help to make that happen?

studerus commented 10 years ago

sure

berndbischl commented 10 years ago

Great, how a about compiling a list of classif learners where we have some kind of class.weights param?

We would need notes whether they are all of the same "form".

Then the property should be set.

We also need a way / function to retrieve that param name. (I will look into that).

studerus commented 10 years ago

Ok, I will compile a list of the classif learners, their argument names and the way they need to be specified

berndbischl commented 10 years ago

@studerus

a) I have greatly improved the R docs of makeWeightedClassesWrapper

b) I have introduced wcw.param which does what you want. Actually I can exploit this now too in a current project, so thanks for insisting on this :) If you finish this list of params, we can even set this param name be default!

Can you please check both?

studerus commented 10 years ago

Thanks!

The following classification functions support direct class weights:

It would be nice if had a mechanism in which we don't have to look up the argument name of each function.

SchroederFabian commented 10 years ago

Bernd, you write:

What you want is weighting classes. Probably because you have either non standard costs on the classes or an imbalanced problem. (I would be curious which one you have....)

Obviously there is now a connection between class weights and case weights: IF we can use the latter, we have a mechanism for the former: Simply set each observation of a certain class i in training to a weight c_i.

My question is: Lets consider a multinomial classification task with a cost matrix. Can I (or how can I) use observation weighting to implement cost-sensitive classification with a given cost structure? I don't think there is a trivial answer to this.

Lets say, I have C classes, then I can only attribute C different cost values to every observation, that's one per class. Using a classification matrix, however, I can attribute C-1 different cost values to every class. Thus the number of cost values I can attribute grows exponentionally with the number of classes in the classification task.

The only case that I can think of is when the misclassification costs of an object is equal for all incorrect predictions. C(j|i) = constant for all j \ i. Then we could give every observation the cost of its "class misclassification".

berndbischl commented 10 years ago

Can I (or how can I) use observation weighting to implement cost-sensitive classification with a given cost structure?

That is a somewhat different question. Can we open up a new thread to discuss this?

I opened #129

SchroederFabian commented 10 years ago

Well, you mentioned, that we could use observation weighting, when we have non-standard costs, if I am not mistaken. Or what do you mean by non-standard costs?

berndbischl commented 10 years ago

Please have a look at the other thread, so we can see whether I defined your problem / question, ok?

The problem is, studerus is doing someting a bit different, because he does not have neatly defined costs as you have (I guess).

berndbischl commented 9 years ago

@kerschke Can you please close this soon?

kerschke commented 9 years ago

After having read your exhaustive dialogue, I'm not quite sure whether I get the task correct. So let me try to summarize it and correct me, if I'm wrong. 1) mlr allows the usage of observation weights but not class weights (at least in general). 2) The former (observation weights) is precisely, what is defined in the properties of a learner (and also highlighted in the learner summary table within the tutorial) 3) Some learners (which @studerus already listed), are able to handle class weights on their own.

And based on that, we now need a property (e.g. classweights) for those learners, so we can easily select them (e.g. using listLearners("classif", properties = "classweights")? However, in that case, one would still need to look into the original learner and try to find the correct name of the class weight argument. So, instead of (or in addition to) the property, we actually might need an additional list element for the learner, which states the name of the class weight parameter. So in case of ksvm, it might look like the following:

      ...
      makeNumericVectorLearnerParam(id = "class.weights", len = NA_integer_, lower = 0),
      makeLogicalLearnerParam(id = "fit", default = TRUE)
    ),
    par.vals = list(fit = FALSE),
    ## add classweights property:
    properties = c("twoclass", "multiclass", "numerics", "factors", "prob", "classweights"),
    ## add id of the classweights-parameter, here "class.weights":
    classweights = "class.weights",
    ...

Or am I misunderstanding the task?

berndbischl commented 9 years ago

1) mlr allows the usage of observation weights but not class weights (at least in general).

Yes, in principle. But note that you can use obs weights to generate class weights with the WeightedClassesWrapper. And that this wrapper also allows to directly use a class.weighting parameter for this purpose. But you have to enter it manually. Read that Wrapper now completely if you havent.

2)

Yes. We might have called the property "obs.weights" to be precise.

Some learners (which @studerus already listed), are able to handle class weights on their own.

Yes. If I know that they have such a property, and have a way to ask for their parameter, and set it in a standardized way, I can do this by default in the WeightedClassesWrapper. Without further user input.

we actually might need an additional list element for the learner, which states the name of the class weight parameter.

Exactly. And we need to document this. And we might need a "setter" for the value of the param. If it sometimes has a slightly different dataformat for the learner. We dont need it, if it is always a named numeric vector, where names are class names.

kerschke commented 9 years ago

If it sometimes has a slightly different dataformat for the learner.

You mean like in mda? Apparently, this function takes weights in a style, as they are computed by mda.start, i.e. a probability for each observation and class, which are then again somehow combined in certain blocks:

> set.seed(123)
> mda.start(iris[1:30, 1:3], iris[1:30, 4])
$`0.1`
     s1
[1,]  1
[2,]  1
[3,]  1

$`0.2`
      s1 s2 s3
 [1,]  1  0  0
 [2,]  0  1  0
 [3,]  0  0  1
 [4,]  0  0  1
 [5,]  1  0  0
 [6,]  0  1  0
 [7,]  0  0  1
 [8,]  1  0  0
 [9,]  0  1  0
[10,]  1  0  0
[11,]  1  0  0
[12,]  0  0  1
[13,]  0  1  0
[14,]  0  1  0
[15,]  1  0  0
[16,]  1  0  0
[17,]  0  1  0

$`0.3`
     s1 s2 s3
[1,]  0  0  1
[2,]  1  0  0
[3,]  0  1  0
[4,]  1  0  0

$`0.4`
     s1 s2 s3
[1,]  1  0  0
[2,]  0  0  1
[3,]  1  0  0
[4,]  0  1  0
[5,]  0  1  0

$`0.5`
     s1
[1,]  1

attr(,"criterion")
[1] 0.3333333
attr(,"name.criterion")
[1] "Misclassification Error"

All the other learners (which were mentioned before) are able to just use a numeric input vector. So, we need to find a way to deal with this strange list of matrices in order to tackle this issue..

berndbischl commented 9 years ago

need to find a way to deal with this strange list of matrices in order to tackle this issue..

I guess we dont. just disregard this in mda. We cannot do anything useful with it anyway?

berndbischl commented 9 years ago

For the other methods you might wanna check wether they accept named vectors. Or more precise: how do they determine which weight entry belongs to which class.

EDIT: because if they rely on position it is always a hassle to figure out what that means exactly.

kerschke commented 9 years ago

Ok.. will do that.

schiffner commented 9 years ago

Maybe to clarify about mda:
Short answer: I agree with Bernd. The weights here can't be used like ordinary class or observation weights.
Long answer: mda.start by default does k-means within single classes and returns a list of index matrices showing the clustering result for all classes. For example mda.start(iris[, 1:4], iris[, 5]) gives a list of 3 index matrices. The weights are updated in each E step of the EM algorithm to reflect the current clustering.

kerschke commented 9 years ago

The way it looks to me, all of the methods from above (except for mda) use a numeric vector for the class weights. In that case, my favorite option for solving this problem is:
Add that specific parameter to the parameter set using makeNumericVectorLearnerParam and then look for those parameters when training the model (including a check, whether the vector is named based on the class levels). In addition, I suggest we add the property class.weights to each of those learners (which would allow to look for learners having that property).

makeRLearner.classif.LiblineaRBinary = function() {
  makeRLearnerClassif(
    cl = "classif.LiblineaRBinary",
    package = "LiblineaR",
    par.set = makeParamSet(
      ...
      # class weights
      makeNumericVectorLearnerParam(id = "wi", len = NA_integer_),
      ...
    ),
    # adding 'class.weights' to the properties
    properties = c("twoclass", "numerics", "class.weights"),
    ...
  )
}

By the way, is it actually necessary that mlr tells the user, which parameter is responsible for the class.weights? In that case, we would either have to set a flag to the parameter, e.g. makeNumericVectorLearnerParam(id = "wi", len = NA_integer_, isWeight = TRUE), (and therefore extend the existing interface of make...Param) or add an argument to the learner, e.g. class.weights = "wi" (changing the interface of makeRLearner...).

What do you think of these approaches? Or do you suggest to do it somehow completely different?

kerschke commented 9 years ago

Just double-checking something regarding the class weights: Since I'm going to add the class weights information to the learner, do you agree that we no longer need the wcw.param argument in makeWeightedClassesWrapper? I would define it via wcw.param = learner$class.weights.param within the function. Or is there any case, where we want to set it manually to a different value than the one in learner$class.weights.param?

berndbischl commented 9 years ago

Sorry. I did not see this question. So you did ask :)

kerschke commented 9 years ago

Solved with PR #385.