mlr-org / mlr

Machine Learning in R
https://mlr.mlr-org.com
Other
1.64k stars 404 forks source link

Proposition: Add methods from package 'DiscriMiner' #21

Closed mareichhoff closed 10 years ago

mareichhoff commented 10 years ago

Methods as e.g. plsDA, geoDA. Perhaps take a look if it's in general an interesting package. Last update was in November 2013.

Also linDA and quDA are available, but I don't know the difference towards MASS lda or qda that already exists in mlr.

berndbischl commented 10 years ago

Jakob, can you please check whether the methods in the package are worthwhile and if so integrate?

jakob-r commented 10 years ago

geoDA added: 063f7a5ac584135ce274fde881b22abb7623af3c It also returns scores which are unused so far.

jakob-r commented 10 years ago

plsDA added: b7745c4a520b33390878b748e6b98954fe5c9700 How deep is my research on the similarities to MASS supposed to be? I think the plain implementation will be faster ;)

berndbischl commented 10 years ago

How deep is my research on the similarities to MASS supposed to be?

I dont get that question.

berndbischl commented 10 years ago

Ok I get it now.

Well it is up to the user to decide what classifier to use. We just offer them.

So as long as it is not an exact code-copy, integrate it and do not waste time on comparing differences. This is not our job.

jakob-r commented 10 years ago

Everything added e8a19f5f6f2ee088f4fc3070374463d0a9c82bbb Waiting for an answer on this issue to get probabilities working.

berndbischl commented 10 years ago

Jakob I dont get your question.

quaDA of course simply returns classes and probs.

So everything is OK

m = quaDA(iris[, 1:4], group = iris$Species)
z = classify(m, newdata=iris[, 1:4])

print(str(z))

List of 2
 $ scores    : num [1:150, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:150] "1" "2" "3" "4" ...
  .. ..$ : chr [1:3] "setosa" "versicolor" "virginica"
 $ pred_class: Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

It is pretty obvious that the scores are the probs. I mean it is DA!

berndbischl commented 10 years ago

Hmm but I see the prob argument in quaDA. If your question was "what is that good for" I dont know either :)

berndbischl commented 10 years ago

Ok I got it. Just look at my_quaDA. That pretty readable.

berndbischl commented 10 years ago

Actually there seems to be something strange. I need a few minutes later to concentrate to really look at the code and formulas. Will get back on this later, do something else for now.

And dont forget to update the NEWS file.

berndbischl commented 10 years ago

OK I read the code. I am not sure if they really want it like this, but "prob" is only used during training and not during the predict. Which seems a bit strange.

So I would suggest this:

a) check against the usual qda probs for 5 minutes that the scores of quaDA are the same and are really a prob matrix.

b) then assume that the prediced scores are the probabilities. Maybe set prob = TRUE during training for good measure, if predict.type = "prob" in mlr.

jakob-r commented 10 years ago

I would not call the score matrix a prob matrix

set.seed(1)
library(DiscriMiner)
source("inst/tests/helper_objects.R")
m = plsDA(multiclass.train[,-multiclass.class.col], group=multiclass.train[,multiclass.class.col])
p = classify(m, newdata=multiclass.test[,-multiclass.class.col])
tail(p$scores)
        setosa  versicolor virginica
55 -0.10550240  0.06513910 1.0403633
56 -0.11184709  0.28333880 0.8285083
57 -0.14319277  0.57721383 0.5659789
58 -0.02948543  0.30791923 0.7215662
59  0.02487976 -0.04780971 1.0229299
60  0.05509692  0.23102495 0.7138781

In comparison

library(caret)
m = plsda(multiclass.train[,-multiclass.class.col], multiclass.train[,multiclass.class.col])
p = predict(m, multiclass.test[,-multiclass.class.col], type="prob")
tail(p[,,"2 comps"])
       setosa versicolor virginica
145 0.2227438  0.2874472 0.4898090
146 0.2327600  0.3186686 0.4485713
147 0.2124043  0.3973980 0.3901977
148 0.2354908  0.3317147 0.4327944
149 0.2493715  0.2924654 0.4581631
150 0.2430837  0.3536987 0.4032176
berndbischl commented 10 years ago

What happens if you set prob = TRUE in the plsDA call?

jakob-r commented 10 years ago

Only the following methods support the prop argument:

set.seed(1); library(DiscriMiner); library(mlr)
source("inst/tests/helper_objects.R")
m = quaDA(multiclass.train[,-multiclass.class.col], group=multiclass.train[,multiclass.class.col], prob=TRUE)
p = classify(m, newdata=multiclass.test[,-multiclass.class.col])
tail(rowSums(p$scores))
      55       56       57       58       59       60 
1.000000 1.000012 1.003536 1.010579 1.000045 1.094329 

m = linDA(multiclass.train[,-multiclass.class.col], group=multiclass.train[,multiclass.class.col], prob=TRUE)
p = classify(m, newdata=multiclass.test[,-multiclass.class.col])
tail(rowSums(p$scores))
      55       56       57       58       59       60 
204.3638 192.6636 150.5844 177.8404 181.8244 146.4177 
berndbischl commented 10 years ago

Can we pls close this very soon?

Please report what the current problems are.

Currently test_learners generates an error because you do not return probs for linDA.