topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.61k stars 634 forks source link

Caret varImp error with SVM model #1292

Open elpidiofilho opened 2 years ago

elpidiofilho commented 2 years ago

The Caret package has been showing an error message when I try to get the importance of variables using the svmRadial model for the diabetes dataset. The error message displayed is: "invalid type (list) for variable 'y'"

library(caret)
#> Carregando pacotes exigidos: ggplot2
#> Carregando pacotes exigidos: lattice
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(MLDataR)

df = MLDataR::diabetes_data |>
  mutate(across(where(is.character), as.factor)) |>
  rename(Class = DiabeticClass)

set.seed(313)
intrain = createDataPartition(df$Class, p = 0.75, list = FALSE)
train = df[intrain,]
test = df[-intrain,]

ctrl <- trainControl(method = 'repeatedcv',
                     number = 10,
                     repeats = 3)

set.seed(313)
model_svm <- train(Class ~.,
                  data = train,
                  method = 'svmRadial',
                  verbose = FALSE,
                  metric = 'Accuracy',
                  trControl = ctrl,
                  tuneLength = 10)
#model_svm
caret::varImp(model_svm)
#> Warning in mean.default(y, rm.na = TRUE): argumento não é numérico nem lógico:
#> retornando NA
#> Warning in Ops.factor(left, right): '-' not meaningful for factors
#> Error in model.frame.default(formula = y ~ x, na.action = na.omit, drop.unused.levels = TRUE): tipo inválido (list) para variável 'y'

Created on 2022-06-14 by the reprex package (v2.0.1)

Created on 2022-06-14 by the reprex package (v2.0.1)

sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Portuguese_Brazil.1252  LC_CTYPE=Portuguese_Brazil.1252   
#> [3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C                      
#> [5] LC_TIME=Portuguese_Brazil.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] rstudioapi_0.13   knitr_1.39        magrittr_2.0.3    R.cache_0.15.0   
#>  [5] rlang_1.0.2       fastmap_1.1.0     fansi_1.0.3       stringr_1.4.0    
#>  [9] styler_1.7.0      highr_0.9         tools_4.1.3       xfun_0.31        
#> [13] R.oo_1.25.0       utf8_1.2.2        cli_3.3.0         withr_2.5.0      
#> [17] htmltools_0.5.2   ellipsis_0.3.2    yaml_2.3.5        digest_0.6.29    
#> [21] tibble_3.1.7      lifecycle_1.0.1   crayon_1.5.1      purrr_0.3.4      
#> [25] R.utils_2.11.0    vctrs_0.4.1       fs_1.5.2          glue_1.6.2       
#> [29] evaluate_0.15     rmarkdown_2.14    reprex_2.0.1      stringi_1.7.6    
#> [33] compiler_4.1.3    pillar_1.7.0      R.methodsS3_1.8.1 pkgconfig_2.0.3

Created on 2022-06-14 by the reprex package (v2.0.1)

luciewoellenstein44 commented 1 year ago

Did you ever find an answer to this?

twest820 commented 10 months ago

+1 in R 4.3.1 with caret 6.0-94.

> varImp(svmFitLinear)
Error in model.frame.default(formula = y ~ x, na.action = na.omit, drop.unused.levels = TRUE) : 
  invalid type (list) for variable 'y'
In addition: Warning messages:
1: In mean.default(y, rm.na = TRUE) :
  argument is not numeric or logical: returning NA
2: In Ops.factor(left, right) : ‘-’ not meaningful for factors
> filterVarImp(svmFitLinear)
Error in data[, fc] : incorrect number of dimensions

Same with radial SVMs.

Did you ever find an answer to this?

Only factor I have is the output, so my working assumption would be varImp()'s assuming a continuous response variable and thus failing to handle classifiers.

serkor1 commented 2 months ago

According to the documentation you should use filterVarImp,

https://github.com/topepo/caret/blob/5f4bd2069bf486ae92240979f9d65b5c138ca8d4/pkg/caret/R/varImp.R#L6-L7

The support vector machine-family of functions doesn't appear to have a variable importance method. This can easily be verified by modifying model-argument to nnet which runs without any issues.