Closed berndbischl closed 9 years ago
Liblinear requires a named vector with the weights for each class. It seems that each class should occur only once in the vector. When I tried to pass the full unchanged .weights vector to LiblineaR, LiblineaR throwed an error.
Ok then you are using the mechanism in a wrong way. I will explain and fix
Explanation:
The "weights" argument in mlr::train and mlr::trainLearner is for OBSERVATION weights. The property "weights" says whether a learner supports such. This is documented here:
https://github.com/berndbischl/mlr/blob/master/R/trainLearner.R
#' @param weights [\code{numeric}]\cr
#' Optional, non-negative case weight vector to be used during fitting.
#' If given, must be of same length as \code{subset} and in corresponding order.
#' By default \code{NULL} which means no weights are used unless specified in the task ([\code{\link{SupervisedTask}}]).
#' Weights from the task will be overwritten.
But LiblineaR does not support those. So we cannot set the property or use that arg in trainLearner!
What you want is weighting classes. Probably because you have either non standard costs on the classes or an imbalanced problem. (I would be curious which one you have....)
Obviously there is now a connection between class weights and case weights: IF we can use the latter, we have a mechanism for the former: Simply set each observation of a certain class i in training to a weight c_i. This is actually now supported out-of-the-box by mlr:
https://github.com/berndbischl/mlr/blob/master/R/WeightedClassesWrapper.R
But unfortunately we already saw we can't do that for for LiblineaR.
So we would either need to set the "wi" hyperparam manually in the learner. This should have been possible before.
Or we would like to tune such a param. If you want the tuning I (just by chance!) improved mlr a bit here, because I need the same in a current project about imbalanced classed.
Sorry, I'm still very confused. My goal is to use the liblineaR
learner for a classification task with unbalanced classes. According to the help file of the liblineaR
function the wi
argument can be used for this. It says:
wi = a named vector of weights for the different classes, used for asymetric class sizes. Not all factor levels have to be supplied (default weight: 1). All components have to be named according to the corresponding class label.
I thought that I had to use the makeWeightedClassesWrapper
for treating unbalanced classes but I cannot use it with the liblineaR
function as the weights property is not set. I also tried to set the wi argument manually in makeLearner
but I receive an error message saying that it fits to several arguments. So, how am I supposed to use liblineaR
with unbalanced classes?
The changes that I made worked for me. I could make a weightedClassesWrapper
around liblineaR
and the balance between true positive and true negative rates improved.
My goal is to use the liblineaR learner for a classification task with unbalanced classes
mlr now supports quite a lot for this
Do you have skype or hangout? I am currently running a large benchmark on this.
Let me try to explain again here:
The "weights" in train, the property "weights" refer to observation weights. One individual weight per observation. The makeWeightedClassesWrapper exploits this and sets weights according to the label of the class of the observation in training. Does not work fpr LibLineaR, as you cannot set individual weights for observations.
But you can simply use the "wi" parameter of LibLineaR, that is what you want.
task = sonar.task
lrn = makeLearner("classif.LiblineaRLogReg")
res = holdout(lrn, task)
print(getConfMatrix(res$pred))
lrn = makeLearner("classif.LiblineaRLogReg", wi = c(M = 1000, R = 1))
res = holdout(lrn, task)
print(getConfMatrix(res$pred))
lrn = makeLearner("classif.LiblineaRLogReg", wi = c(M = 1, R = 10000))
res = holdout(lrn, task)
print(getConfMatrix(res$pred))
[Resample] holdout iter: 1
[Resample] Result: mmce.test.mean= 0.3
predicted
true M R -SUM-
M 28 10 10
R 11 21 11
-SUM- 11 10 21
[Resample] holdout iter: 1
[Resample] Result: mmce.test.mean=0.386
predicted
true M R -SUM-
M 43 0 0
R 27 0 27
-SUM- 27 0 27
[Resample] holdout iter: 1
[Resample] Result: mmce.test.mean=0.529
predicted
true M R -SUM-
M 0 37 37
R 0 33 0
-SUM- 0 37 37
I am currently improving some things to make tuning for such "named" vector parameters better, will be finished soon. And mlr now supports many other possibilities for imbalanced classes as well.
I know that mlr supports other possibilities for imbalanced classes, such as SMOTE. I tried both the SMOTE and weighted classes approach with liblineaR. They had about the same performance in terms of discrimination and balance between tpr and tnr but the weighted classes approach is computationally more efficient. That's why I'd prefer to use weighted classes.
Sure, I completely understand.
Does it help what I posted? Is there still an open question?
It works now for me to set wi
directly in makeLearner
(don't know why it didn't before). Still, I (and possibly others too will) find it a bit confusing that one cannot apply makeWeightedClassesWrapper
to the liblineaR learner when the goal is to treat class imbalance and liblineaR has support for class weights. Btw, they same seems to apply to ksvm.
Still, I (and possibly others too will) find it a bit confusing that one cannot apply makeWeightedClassesWrapper to the liblineaR learner when the goal is to treat class imbalance and liblineaR has support for class weights
Again, the WeightedClassWrapper is NOT for USING class weights. It is there for cases where OBSERVATION weights are available, but class weights NOT. It gives you a chance to CREATE class weights when they are not there!
What do you propose to improve the situation?
Document this better?
Maybe this: I could extend the WeightedClassWrapper it dispatches to a hyperparemeter (wi in this case)?
So:
makeWeightedClassesWrapper("classif.LiblinearLogReg", param = "wi", class.names = c("M", "R")
?
The thing is:
This does exactly zero. It just passes the weights down and creates a slight overhead....
And makes the interface more complicated.
I guess this are obvious correct extensions.
1) a) Create a property "class.weights" that says whether a learner supports class.weights. Then you can search for those with list.learners. b) Allow to ask the learner how that hyper.par is called.
2) Improve the docs page of WeightedClassesWrapper. Show in an example the two alternatives we discussed.
1) I think that's a good idea, because I already wanted to use listLearners in that way before. 2) Why is it so important that makeWeightedClassWrapper is only used for cases where observation weights are available and class weights not? Why can't we use it for any case that allows class weights regardless of the mechanism?
Using makeWeightedClassesWrapper for any case would be more convenient because then I wouldn't have look up the argument of the specific learner and class names every time I want to weight the classes.
Eric, would you be willing to help to make that happen?
sure
Great, how a about compiling a list of classif learners where we have some kind of class.weights param?
We would need notes whether they are all of the same "form".
Then the property should be set.
We also need a way / function to retrieve that param name. (I will look into that).
Ok, I will compile a list of the classif learners, their argument names and the way they need to be specified
@studerus
a) I have greatly improved the R docs of makeWeightedClassesWrapper
b) I have introduced wcw.param which does what you want. Actually I can exploit this now too in a current project, so thanks for insisting on this :) If you finish this list of params, we can even set this param name be default!
Can you please check both?
Thanks!
The following classification functions support direct class weights:
It would be nice if had a mechanism in which we don't have to look up the argument name of each function.
Bernd, you write:
What you want is weighting classes. Probably because you have either non standard costs on the classes or an imbalanced problem. (I would be curious which one you have....)
Obviously there is now a connection between class weights and case weights: IF we can use the latter, we have a mechanism for the former: Simply set each observation of a certain class i in training to a weight c_i.
My question is: Lets consider a multinomial classification task with a cost matrix. Can I (or how can I) use observation weighting to implement cost-sensitive classification with a given cost structure? I don't think there is a trivial answer to this.
Lets say, I have C classes, then I can only attribute C different cost values to every observation, that's one per class. Using a classification matrix, however, I can attribute C-1 different cost values to every class. Thus the number of cost values I can attribute grows exponentionally with the number of classes in the classification task.
The only case that I can think of is when the misclassification costs of an object is equal for all incorrect predictions. C(j|i) = constant for all j \ i. Then we could give every observation the cost of its "class misclassification".
Can I (or how can I) use observation weighting to implement cost-sensitive classification with a given cost structure?
That is a somewhat different question. Can we open up a new thread to discuss this?
I opened #129
Well, you mentioned, that we could use observation weighting, when we have non-standard costs, if I am not mistaken. Or what do you mean by non-standard costs?
Please have a look at the other thread, so we can see whether I defined your problem / question, ok?
The problem is, studerus is doing someting a bit different, because he does not have neatly defined costs as you have (I guess).
@kerschke Can you please close this soon?
After having read your exhaustive dialogue, I'm not quite sure whether I get the task correct. So let me try to summarize it and correct me, if I'm wrong. 1) mlr allows the usage of observation weights but not class weights (at least in general). 2) The former (observation weights) is precisely, what is defined in the properties of a learner (and also highlighted in the learner summary table within the tutorial) 3) Some learners (which @studerus already listed), are able to handle class weights on their own.
And based on that, we now need a property (e.g. classweights) for those learners, so we can easily select them (e.g. using listLearners("classif", properties = "classweights")
? However, in that case, one would still need to look into the original learner and try to find the correct name of the class weight argument. So, instead of (or in addition to) the property, we actually might need an additional list element for the learner, which states the name of the class weight parameter. So in case of ksvm, it might look like the following:
...
makeNumericVectorLearnerParam(id = "class.weights", len = NA_integer_, lower = 0),
makeLogicalLearnerParam(id = "fit", default = TRUE)
),
par.vals = list(fit = FALSE),
## add classweights property:
properties = c("twoclass", "multiclass", "numerics", "factors", "prob", "classweights"),
## add id of the classweights-parameter, here "class.weights":
classweights = "class.weights",
...
Or am I misunderstanding the task?
1) mlr allows the usage of observation weights but not class weights (at least in general).
Yes, in principle. But note that you can use obs weights to generate class weights with the WeightedClassesWrapper. And that this wrapper also allows to directly use a class.weighting parameter for this purpose. But you have to enter it manually. Read that Wrapper now completely if you havent.
2)
Yes. We might have called the property "obs.weights" to be precise.
Some learners (which @studerus already listed), are able to handle class weights on their own.
Yes. If I know that they have such a property, and have a way to ask for their parameter, and set it in a standardized way, I can do this by default in the WeightedClassesWrapper. Without further user input.
we actually might need an additional list element for the learner, which states the name of the class weight parameter.
Exactly. And we need to document this. And we might need a "setter" for the value of the param. If it sometimes has a slightly different dataformat for the learner. We dont need it, if it is always a named numeric vector, where names are class names.
If it sometimes has a slightly different dataformat for the learner.
You mean like in mda
? Apparently, this function takes weights in a style, as they are computed by mda.start
, i.e. a probability for each observation and class, which are then again somehow combined in certain blocks:
> set.seed(123)
> mda.start(iris[1:30, 1:3], iris[1:30, 4])
$`0.1`
s1
[1,] 1
[2,] 1
[3,] 1
$`0.2`
s1 s2 s3
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
[4,] 0 0 1
[5,] 1 0 0
[6,] 0 1 0
[7,] 0 0 1
[8,] 1 0 0
[9,] 0 1 0
[10,] 1 0 0
[11,] 1 0 0
[12,] 0 0 1
[13,] 0 1 0
[14,] 0 1 0
[15,] 1 0 0
[16,] 1 0 0
[17,] 0 1 0
$`0.3`
s1 s2 s3
[1,] 0 0 1
[2,] 1 0 0
[3,] 0 1 0
[4,] 1 0 0
$`0.4`
s1 s2 s3
[1,] 1 0 0
[2,] 0 0 1
[3,] 1 0 0
[4,] 0 1 0
[5,] 0 1 0
$`0.5`
s1
[1,] 1
attr(,"criterion")
[1] 0.3333333
attr(,"name.criterion")
[1] "Misclassification Error"
All the other learners (which were mentioned before) are able to just use a numeric input vector. So, we need to find a way to deal with this strange list of matrices in order to tackle this issue..
need to find a way to deal with this strange list of matrices in order to tackle this issue..
I guess we dont. just disregard this in mda. We cannot do anything useful with it anyway?
For the other methods you might wanna check wether they accept named vectors. Or more precise: how do they determine which weight entry belongs to which class.
EDIT: because if they rely on position it is always a hassle to figure out what that means exactly.
Ok.. will do that.
Maybe to clarify about mda
:
Short answer: I agree with Bernd. The weights here can't be used like ordinary class or observation weights.
Long answer: mda.start
by default does k-means within single classes and returns a list of index matrices showing the clustering result for all classes. For example mda.start(iris[, 1:4], iris[, 5])
gives a list of 3 index matrices. The weights are updated in each E step of the EM algorithm to reflect the current clustering.
The way it looks to me, all of the methods from above (except for mda
) use a numeric vector for the class weights. In that case, my favorite option for solving this problem is:
Add that specific parameter to the parameter set using makeNumericVectorLearnerParam
and then look for those parameters when training the model (including a check, whether the vector is named based on the class levels). In addition, I suggest we add the property class.weights
to each of those learners (which would allow to look for learners having that property).
makeRLearner.classif.LiblineaRBinary = function() {
makeRLearnerClassif(
cl = "classif.LiblineaRBinary",
package = "LiblineaR",
par.set = makeParamSet(
...
# class weights
makeNumericVectorLearnerParam(id = "wi", len = NA_integer_),
...
),
# adding 'class.weights' to the properties
properties = c("twoclass", "numerics", "class.weights"),
...
)
}
By the way, is it actually necessary that mlr tells the user, which parameter is responsible for the class.weights?
In that case, we would either have to set a flag to the parameter, e.g. makeNumericVectorLearnerParam(id = "wi", len = NA_integer_, isWeight = TRUE)
, (and therefore extend the existing interface of make...Param
) or add an argument to the learner, e.g. class.weights = "wi"
(changing the interface of makeRLearner...
).
What do you think of these approaches? Or do you suggest to do it somehow completely different?
Just double-checking something regarding the class weights:
Since I'm going to add the class weights information to the learner, do you agree that we no longer need the wcw.param
argument in makeWeightedClassesWrapper
? I would define it via wcw.param = learner$class.weights.param
within the function. Or is there any case, where we want to set it manually to a different value than the one in learner$class.weights.param
?
Sorry. I did not see this question. So you did ask :)
Solved with PR #385.
Hi Eric,
why do you do this:
The part:
= .weights[unique(names(.weights))]