topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.62k stars 632 forks source link

svmRadial and rfe example is not working #354

Closed ypriverol closed 8 years ago

ypriverol commented 8 years ago

@topepo

svmProfile <- rfe(x, logBBB, sizes = c(2, 5, 10, 20), rfeControl = rfeControl(functions = caretFuncs, number = 3, verbose = TRUE),method = "svmRadial") and it fail with the same error.

Error in { : task 1 failed - "undefined columns selected"

cafernandezlo commented 8 years ago

please check caret version

packageVersion('caret')

remove this version and the bug

remove.packages(c('caret'))

library(devtools)

install Caret Package version 6.0.52, without the bug

install_version(package='caret',version='6.0-52')

run again and report ;)

ypriverol commented 8 years ago

@cafernandezlo I fixed the error by using:

mod <- getModelInfo("svmRadial", regex = FALSE)[[1]]

mod$predict <- function(modelFit, newdata, submodels = NULL) {
    svmPred <- function(obj, x) {
        hasPM <- !is.null(unlist(obj@prob.model))
        if(hasPM) {
            pred <- lev(obj)[apply(predict(obj, x, type = "probabilities"), 1, which.max)]
        } else pred <- predict(obj, x)
        pred
    }
    out <- try(svmPred(modelFit, newdata), silent = TRUE)
    if(is.character(lev(modelFit))) {
        if(class(out)[1] == "try-error") {
            warning("kernlab class prediction calculations failed; returning NAs")
            out <- rep("", nrow(newdata))
            out[seq(along = out)] <- NA
        }
    } else {
        if(class(out)[1] == "try-error") {
            warning("kernlab prediction calculations failed; returning NAs")
            out <- rep(NA, nrow(newdata))
        }
    }
    if(is.matrix(out)) out <- out[,1]
    out
}

#Support Vector Machine Object
svmProfileValue <- rfe(trainDescr,trainClass, sizes = (1:4),rfeControl = rfeControl(functions =    
caretFuncs,number = numberIter, verbose = TRUE),method = mod);

But I can try your solution.

cafernandezlo commented 8 years ago

@ypriverol it is not a real solution, it's simply a downgrade to avoid the problem. We found a similar problem with our RRgress package https://github.com/enanomapper/RRegrs and the last version of Caret package.

ypriverol commented 8 years ago

Thanks @cafernandezlo I will try.

topepo commented 8 years ago

The solution from @ypriverol was recommended by me.

In the previous version of caret, extractPrediction took care if it and in newer versions, we avoid that function in predict.train. Interestingly, extractPrediction was not designed to convert the matrix down to a vector. That change in output type appears to happened in kernlab after the original version of the function. Fortunately, the code

pred <- c(pred, tempUnkPred)

unknowingly solve the issue. Something similar occurred with the earth package last year.

I think that the best solution is to modify the model objects to return a vector (since that is what we want). Eventually, train will be modified to return vector valued predictions and modifying predictionFunction to fix this bug will be undone.

ypriverol commented 8 years ago

@topepo @cafernandezlo @enriquea Hi guys I have been facing a problem, when I finish the training and get my final model using my current hack in caret. My predict function always retrieve one value. See the following code:

newData<- data.frame(bjell=4, calibrated=4.9, expasy=4.5) svmModel <- svmModel predict(svmModel, newdata=newData)

topepo commented 8 years ago

Well, that is what I would expect:

> newData<- data.frame(bjell=4, calibrated=4.9, expasy=4.5)
> nrow(newData)
[1] 1
ypriverol commented 8 years ago

No really understand

ypriverol commented 8 years ago

@topepo sorry didn't express in a proper way. If I change:

newData<- data.frame(bjell=4, calibrated=4.9, expasy=4.5) svmModel <- svmModel predict(svmModel, newdata=newData)

for

newData<- data.frame(bjell=4, calibrated=10, expasy=4.5) svmModel <- svmModel predict(svmModel, newdata=newData)

it gives me the same value even when the varaibles change

enriquea commented 8 years ago

Hi all, Running it, I hoped for a vector with different values in pIs but I get the same. Any ideas?

dframe <- data.frame(calibrated, bjell, expasy) dframe calibrated bjell expasy 1 4.5 5.4 6.8 2 4.9 5.6 6.0 3 5.1 5.9 7.1 pIs <- predict(object = svmModel, newdata=dframe) pIs [1] 6.417835 6.417835 6.417835

enriquea commented 8 years ago

I am using the "predict" function from Kernlab package.

topepo commented 8 years ago

I am using the "predict" function from Kernlab package.

I think that there might be things happening that you are not showing (like how svmModel was created). That's why we always want a small, reproducible example.

It is rare that you should generate a model using train (or rfe or others) and use the original predict code. train does things that the underlying model object may not know about (e.g. pre-processing). You should not expect to get the same/right answer by doing so.

enriquea commented 8 years ago

Here an example training the svm classifier:

    load("C:/Users/Enrique/Git/pIR-master/data/svmPeptideData.rda")

    peptides_properties <- subset(data, select=c("bjell", "expasy", "calibrated","aaindex"))

    peptides_experimental <- subset(data, select=c("pIExp"))

    svmModel <- svmProfile(dfExp = peptides_experimental, dfProp = peptides_properties, method = method, numberIter = numberIter)`

The svmProfile function looks like this:

svmProfile <- function(dfExp, dfProp, method = "svmRadial", numberIter = 2){

#load Data
# This is the data file with the descriptors:
peptides_desc <- as.matrix(dfProp);

# This is the Data File with the Experimental Isoelectric Point
peptides_class <- as.matrix(dfExp);

#Scale and center data
peptides_desc <- scale(peptides_desc,center=TRUE,scale=TRUE);

#Divide the dataset in train and test sets

# Create an index of the number to train
inTrain <- createDataPartition(peptides_class, p = 3/4, list = FALSE)[,1];

#Create the Training Dataset for Descriptors
trainDescr <- peptides_desc[inTrain,];

# Create the Testing dataset for Descriptors
testDescr <- peptides_desc[-inTrain,];

trainClass <- peptides_class[inTrain];
testClass <- peptides_class[-inTrain];

mod <- getModelInfo("svmRadial", regex = FALSE)[[1]]

mod$predict <- function(modelFit, newdata, submodels = NULL) {
    svmPred <- function(obj, x) {
        hasPM <- !is.null(unlist(obj@prob.model))
        if(hasPM) {
            pred <- lev(obj)[apply(predict(obj, x, type = "probabilities"), 1, which.max)]
        } else pred <- predict(obj, x)
        pred
    }
    out <- try(svmPred(modelFit, newdata), silent = TRUE)
    if(is.character(lev(modelFit))) {
        if(class(out)[1] == "try-error") {
            warning("kernlab class prediction calculations failed; returning NAs")
            out <- rep("", nrow(newdata))
            out[seq(along = out)] <- NA
        }
    } else {
        if(class(out)[1] == "try-error") {
            warning("kernlab prediction calculations failed; returning NAs")
            out <- rep(NA, nrow(newdata))
        }
    }
    if(is.matrix(out)) out <- out[,1]
    out
}

#Support Vector Machine Object
svmProfileValue <- rfe(trainDescr, trainClass, sizes = (1:4),rfeControl = rfeControl(functions = caretFuncs,number = numberIter, verbose = TRUE),method = mod);

return (svmProfileValue)
topepo commented 8 years ago

svmModel looks like it should of class train. Why did you say that you were using the kernlab predict function?

Also, I don't know anything about your data. If the predictors are on different metrics, you really should be centering and scaling. Otherwise the predictor with the largest values will dominate the dot product and this could very well be why you are getting the same value predicted.

enriquea commented 8 years ago

To predict new values using a new dataset I am using the following code:

svmModel <- svmProfile()

pIs <- predict(object = svmModel, newdata=dframe)

Is that correct?

ypriverol commented 8 years ago

@topepo looks like the problem can be related with the scale @enriquea and myselft will test that. BTW if we use the same scale function from caret to scale the new values .. what will happen?

topepo commented 8 years ago

So if you use the preProc argument to train, it will:

Does that address your question?

ypriverol commented 8 years ago

Ok. Can you point to any example where the preProc is used

topepo commented 8 years ago

It is discussed here and a simple example is here

enriquea commented 8 years ago

Hi @ypriverol and @topepo:

The problem was fixed by applying the same transformation to the new data that the training data set using a function preProcess. Thank you a lot for your collaboration and time.

ypriverol commented 8 years ago

@enriquea thanks I will close the issue. +1