RFE - Can't use with Logistic Regression: Error in {: task 1 failed - "undefined columns selected"

rnmourao commented 5 years ago

Hi,

I can't use RFE for Logistic Regression (lrFuncs or caretFuncs + glm):

lrRFE <- rfe(df[,features], df[,label],
             sizes=c(1:10, 15, 30), rfeControl=rfeControl(functions = lrFuncs, method = "cv"))

plot(lrRFE, type = c("o", "g"))

Output:

Error in {: task 1 failed - "undefined columns selected"

The same code using ldaFuncs works well:

ldaRFE <- rfe(df[,features], df[,label],
             sizes=c(1:10, 15, 30), rfeControl=rfeControl(functions = ldaFuncs, method = "cv"))

plot(ldaRFE, type = c("o", "g"))

My sessionInfo():

R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS/LAPACK: /opt/conda/lib/libopenblasp-r0.3.7.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pROC_1.15.3     caret_6.0-84    ggplot2_3.2.1   lattice_0.20-38

loaded via a namespace (and not attached):
 [1] pbdZMQ_0.3-3       tidyselect_0.2.5   repr_1.0.1         purrr_0.3.2       
 [5] reshape2_1.4.3     splines_3.6.1      vctrs_0.2.0        colorspace_1.4-1  
 [9] generics_0.0.2     stats4_3.6.1       htmltools_0.3.6    base64enc_0.1-3   
[13] survival_2.44-1.1  prodlim_2018.04.18 rlang_0.4.0        e1071_1.7-2       
[17] ModelMetrics_1.2.2 pillar_1.4.2       glue_1.3.1         withr_2.1.2       
[21] uuid_0.1-2         foreach_1.4.7      plyr_1.8.4         lava_1.6.6        
[25] stringr_1.4.0      timeDate_3043.102  munsell_0.5.0      gtable_0.3.0      
[29] recipes_0.1.6      codetools_0.2-16   evaluate_0.14      labeling_0.3      
[33] class_7.3-15       IRdisplay_0.7.0    Rcpp_1.0.2         backports_1.1.4   
[37] scales_1.0.0       IRkernel_1.0.2     ipred_0.9-9        jsonlite_1.6      
[41] digest_0.6.20      stringi_1.4.3      dplyr_0.8.3        grid_3.6.1        
[45] tools_3.6.1        magrittr_1.5       lazyeval_0.2.2     tibble_2.1.3      
[49] zeallot_0.1.0      crayon_1.3.4       pkgconfig_2.0.2    MASS_7.3-51.4     
[53] Matrix_1.2-17      data.table_1.12.2  lubridate_1.7.4    gower_0.2.1       
[57] assertthat_0.2.1   iterators_1.0.12   R6_2.4.0           rpart_4.1-15      
[61] nnet_7.3-12        nlme_3.1-141       compiler_3.6.1

A sample dataset (some warning messages about collinearity appear due the sample size...if you want to reproduce the error without these warnings, please use the entire set, at https://github.com/rnmourao/r_3.6.1-caret-classificacao/blob/master/dados/train.csv):

structure(list(`textoSaldoContaCorrente == (-∞, 0)` = c(0, 
0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1), `textoSaldoContaCorrente == [0, 200)` = c(1, 
1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0), `textoSaldoContaCorrente == [200, ∞)` = c(1, 
0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0), quantidadeMesContaCorrente = c(0.117647058823529, 
0.558823529411765, 0.294117647058824, 0.470588235294118, 0.161764705882353, 
0.382352941176471, 0.0735294117647059, 0.0294117647058824, 0.0882352941176471, 
0.205882352941176, 0.647058823529412, 0.102941176470588, 0.338235294117647, 
0.0588235294117647, 0.735294117647059, 0.235294117647059), `textoHistoricoCredito == conta critica / outros creditos existentes (nao neste banco)` = c(1, 
1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0), `textoHistoricoCredito == historico de atrasos` = c(1, 
1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1), `textoHistoricoCredito == sem emprestimos anteriores / todos os creditos anteriores pagos em dia ` = c(1, 
0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1), `textoHistoricoCredito == todos os creditos neste banco pagos em dia` = c(1, 
0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0), `textoFinalidadeCredito == carro (novo)` = c(1, 
0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1), `textoFinalidadeCredito == carro (usado)` = c(0, 
1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1), `textoFinalidadeCredito == educacao` = c(0, 
1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1), `textoFinalidadeCredito == eletrodomesticos` = c(0, 
0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0), `textoFinalidadeCredito == moveis/equipamento` = c(1, 
1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1), `textoFinalidadeCredito == negocios` = c(0, 
0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1), `textoFinalidadeCredito == reciclagem educacional` = c(0, 
1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0), `textoFinalidadeCredito == reforma` = c(1, 
1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1), valorSolicitadoCredito = c(0.101573676680973, 
0.419940574447012, 0.142236161549466, 0.368548475844613, 0.154561461428414, 
0.0634422801804776, 0.430395069880048, 0.174975239352922, 0.10366457576758, 
0.131891713436778, 0.109552107406185, 0.0855067679102014, 0.100088037856278, 
0.0618465940354352, 0.00968416419060196, 0.0915043468691537), 
    `textoInvestimento == [100, 500)` = c(1, 0, 0, 0, 0, 0, 1, 
    0, 0, 0, 1, 0, 0, 1, 1, 1), `textoInvestimento == [1000, ∞)` = c(0, 
    0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0), `textoInvestimento == [500, 1000)` = c(0, 
    1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1), `textoInvestimento == 0` = c(0, 
    0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1), `textoAnoEmprego == (0, 1)` = c(1, 
    0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1), `textoAnoEmprego == [4, 7)` = c(0, 
    0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1), `textoAnoEmprego == [7, ∞)` = c(1, 
    0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0), `textoAnoEmprego == 0` = c(0, 
    1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1), valorTaxaComprometimentoRenda = c(0, 
    0.666666666666667, 0.666666666666667, 1, 0, 0, 0.666666666666667, 
    1, 0, 0.666666666666667, 0, 0.333333333333333, 0, 0.333333333333333, 
    1, 0.666666666666667), `textoSexoEstadoCivil == homem: casado/viuvo` = c(0, 
    0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1), `textoSexoEstadoCivil == homem: divorciado/separado` = c(0, 
    0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1), `textoSexoEstadoCivil == mulher: divorciado/separado/casado` = c(1, 
    1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1), `indicadorAvalista == cofiador` = c(1, 
    0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1), `indicadorAvalista == fiador` = c(0, 
    1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1), textoAnoResidencia = c(0.333333333333333, 
    0, 0, 0.333333333333333, 0.666666666666667, 0.666666666666667, 
    1, 0, 0, 1, 1, 1, 0.666666666666667, 0, 0, 0.666666666666667
    ), `textoGarantia == ` = c(0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 
    0, 1, 1, 1, 1, 1), `textoGarantia == imovel` = c(0, 0, 0, 
    0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0), `textoGarantia == investimentos / seguro de vida` = c(0, 
    1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1), numeroIdade = c(0.535714285714286, 
    0.464285714285714, 0.607142857142857, 0.285714285714286, 
    0.75, 0.160714285714286, 0.107142857142857, 0.214285714285714, 
    0.517857142857143, 0.446428571428571, 0.125, 0.303571428571429, 
    0.357142857142857, 0.142857142857143, 0.678571428571429, 
    0.25), `textoOutroCredito == banco` = c(1, 1, 1, 0, 1, 0, 
    0, 0, 1, 0, 1, 0, 0, 0, 0, 1), `textoOutroCredito == lojas` = c(0, 
    1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1), `textoNaturezaResidencia == alugado` = c(0, 
    0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0), `textoNaturezaResidencia == de favor` = c(0, 
    0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0), quantidadeCreditoAnterior = c(0, 
    1, 1, 0, 0.666666666666667, 0.333333333333333, 0, 0.666666666666667, 
    1, 0.333333333333333, 0, 1, 0.666666666666667, 0.666666666666667, 
    1, 0), `textoEmprego == desempregado/empregado nao especializado - nao-residente` = c(0, 
    0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0), `textoEmprego == empregado nao especializado - residente` = c(1, 
    0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1), `textoEmprego == gerente/autonomo/empregado altamente especializado/forcas armadas` = c(0, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1), quantidadeAvalista = c(0, 
    0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0), indicadorPosseTelefone = c(1, 
    0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1), indicadorTrabalhadorEstrangeiro = c(0, 
    1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1), indicadorInadimplente = structure(c(1L, 
    1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L
    ), .Label = c("0", "1"), class = "factor")), row.names = c(NA, 
-16L), class = "data.frame")

topepo commented 4 years ago

What are the values of features and label?

topepo commented 4 years ago

Also, for the data in that link, the columns are all numeric. It would be helpful to have the code that you use to go between the file and the data used.

df <- read.csv(url("https://github.com/rnmourao/r_3.6.1-caret-classificacao/raw/master/dados/train.csv"))
str(df)
#> 'data.frame':    980 obs. of  48 variables:
#>  $ quantidadeMesContaCorrente                                                                   : num  0.118 0.559 0.294 0.471 0.118 ...
#>  $ valorSolicitadoCredito                                                                       : num  0.102 0.42 0.142 0.369 0.155 ...
#>  $ valorTaxaComprometimentoRenda                                                                : num  0.333 0.333 0.667 0.333 0.333 ...
#>  $ textoAnoResidencia                                                                           : num  0.667 1 1 0.333 1 ...
#>  $ numeroIdade                                                                                  : num  0.536 0.464 0.607 0.286 0.75 ...
#>  $ quantidadeCreditoAnterior                                                                    : num  0 0 0 0 0 ...
#>  $ quantidadeAvalista                                                                           : int  1 1 0 0 0 0 0 1 0 1 ...
#>  $ indicadorPosseTelefone                                                                       : int  0 0 0 1 0 0 0 1 1 0 ...
#>  $ indicadorTrabalhadorEstrangeiro                                                              : int  1 1 1 1 1 1 1 1 1 1 ...
#>  $ indicadorInadimplente                                                                        : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ textoSaldoContaCorrente_X.....0.                                                             : int  0 1 0 0 0 1 1 0 0 1 ...
#>  $ textoSaldoContaCorrente_X.0..200.                                                            : int  0 0 0 1 0 0 0 0 0 0 ...
#>  $ textoSaldoContaCorrente_X.200....                                                            : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ textoHistoricoCredito_conta.critica...outros.creditos.existentes..nao.neste.banco.           : int  1 0 0 0 0 0 0 0 1 0 ...
#>  $ textoHistoricoCredito_historico.de.atrasos                                                   : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ textoHistoricoCredito_sem.emprestimos.anteriores...todos.os.creditos.anteriores.pagos.em.dia.: int  0 0 0 0 0 0 1 0 0 0 ...
#>  $ textoHistoricoCredito_todos.os.creditos.neste.banco.pagos.em.dia                             : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ textoFinalidadeCredito_carro..novo.                                                          : int  0 0 0 0 0 1 0 0 1 0 ...
#>  $ textoFinalidadeCredito_carro..usado.                                                         : int  0 0 0 1 0 0 0 0 0 0 ...
#>  $ textoFinalidadeCredito_educacao                                                              : int  1 0 0 0 0 0 0 0 0 0 ...
#>  $ textoFinalidadeCredito_eletrodomesticos                                                      : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ textoFinalidadeCredito_moveis.equipamento                                                    : int  0 1 1 0 0 0 0 0 0 0 ...
#>  $ textoFinalidadeCredito_negocios                                                              : int  0 0 0 0 0 0 1 0 0 0 ...
#>  $ textoFinalidadeCredito_reciclagem.educacional                                                : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ textoFinalidadeCredito_reforma                                                               : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ textoInvestimento_X.100..500.                                                                : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ textoInvestimento_X.1000....                                                                 : int  0 0 0 0 1 0 0 0 0 0 ...
#>  $ textoInvestimento_X.500..1000.                                                               : int  0 0 1 0 0 0 0 1 0 1 ...
#>  $ textoInvestimento_X0                                                                         : int  0 0 0 0 0 0 1 0 0 0 ...
#>  $ textoAnoEmprego_X.0..1.                                                                      : int  0 0 0 0 0 0 1 0 0 0 ...
#>  $ textoAnoEmprego_X.4..7.                                                                      : int  1 1 0 0 1 0 0 0 0 0 ...
#>  $ textoAnoEmprego_X.7....                                                                      : int  0 0 1 0 0 0 0 1 0 0 ...
#>  $ textoAnoEmprego_X0                                                                           : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ textoSexoEstadoCivil_homem..casado.viuvo                                                     : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ textoSexoEstadoCivil_homem..divorciado.separado                                              : int  0 0 0 0 1 0 0 0 0 0 ...
#>  $ textoSexoEstadoCivil_mulher..divorciado.separado.casado                                      : int  0 0 0 0 0 1 0 0 0 0 ...
#>  $ indicadorAvalista_cofiador                                                                   : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ indicadorAvalista_fiador                                                                     : int  0 1 0 0 0 0 0 0 0 0 ...
#>  $ textoGarantia_X                                                                              : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ textoGarantia_imovel                                                                         : int  1 0 0 0 1 0 0 0 0 1 ...
#>  $ textoGarantia_investimentos...seguro.de.vida                                                 : int  0 1 1 0 0 0 0 0 0 0 ...
#>  $ textoOutroCredito_banco                                                                      : int  0 0 0 0 0 0 1 0 0 0 ...
#>  $ textoOutroCredito_lojas                                                                      : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ textoNaturezaResidencia_alugado                                                              : int  0 0 0 1 0 1 0 0 0 1 ...
#>  $ textoNaturezaResidencia_de.favor                                                             : int  0 1 0 0 0 0 0 0 0 0 ...
#>  $ textoEmprego_desempregado.empregado.nao.especializado...nao.residente                        : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ textoEmprego_empregado.nao.especializado...residente                                         : int  1 0 0 0 1 0 0 0 0 0 ...
#>  $ textoEmprego_gerente.autonomo.empregado.altamente.especializado.forcas.armadas               : int  0 0 0 1 0 0 0 0 0 0 ...

^{Created on 2020-01-02 by the reprex package (v0.3.0)}

rnmourao commented 4 years ago

Hi Max,

The label is indicadorInadimplente. I used all other attributes as features.

This commit has the error.

_01preparacao.ipynb has data preparation. _02modelagem.ipynb has modeling and RFE.

This job required me to write all the explanations and data in Portuguese. However, I believe the flow of notebooks is quite straightforward. If you have any doubts, please contact me.

topepo commented 4 years ago

I can't reproduce it:

library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
df <-
  read.csv(
    url(
      "https://github.com/rnmourao/r_3.6.1-caret-classificacao/raw/master/dados/train.csv"
    )
  )

df$indicadorInadimplente <- factor(df$indicadorInadimplente)

set.seed(364525)
lrRFE <- rfe(
  x = df[, names(df) != "indicadorInadimplente"],
  y = df$indicadorInadimplente,
  sizes = c(1:10, 15, 30),
  rfeControl = rfeControl(functions = lrFuncs, method = "cv")
)
lrRFE
#> 
#> Recursive feature selection
#> 
#> Outer resampling method: Cross-Validated (10 fold) 
#> 
#> Resampling performance over subset size:
#> 
#>  Variables Accuracy  Kappa AccuracySD KappaSD Selected
#>          1   0.6143 0.2286    0.03395 0.06789         
#>          2   0.6765 0.3531    0.03504 0.07007         
#>          3   0.6745 0.3490    0.03549 0.07099         
#>          4   0.6673 0.3347    0.03270 0.06539         
#>          5   0.6531 0.3061    0.05158 0.10317         
#>          6   0.6541 0.3082    0.05861 0.11722         
#>          7   0.6480 0.2959    0.05777 0.11555         
#>          8   0.6469 0.2939    0.06145 0.12290         
#>          9   0.6582 0.3163    0.06221 0.12442         
#>         10   0.6684 0.3367    0.04595 0.09190         
#>         15   0.6837 0.3673    0.04614 0.09228         
#>         30   0.7194 0.4388    0.03826 0.07651         
#>         47   0.7255 0.4510    0.05323 0.10646        *
#> 
#> The top 5 variables (out of 47):
#>    textoSaldoContaCorrente_X.....0., textoSaldoContaCorrente_X.0..200., valorSolicitadoCredito, textoSaldoContaCorrente_X.200...., textoInvestimento_X.1000....

# or 

lrRFE <- rfe(
  indicadorInadimplente ~ .,
  data = df,
  sizes = c(1:10, 15, 30),
  rfeControl = rfeControl(functions = lrFuncs, method = "cv")
)
lrRFE
#> 
#> Recursive feature selection
#> 
#> Outer resampling method: Cross-Validated (10 fold) 
#> 
#> Resampling performance over subset size:
#> 
#>  Variables Accuracy  Kappa AccuracySD KappaSD Selected
#>          1   0.6143 0.2286    0.04634 0.09268         
#>          2   0.6765 0.3531    0.04083 0.08166         
#>          3   0.6735 0.3469    0.03967 0.07933         
#>          4   0.6663 0.3327    0.04139 0.08279         
#>          5   0.6663 0.3327    0.03820 0.07639         
#>          6   0.6561 0.3122    0.04462 0.08924         
#>          7   0.6622 0.3245    0.04178 0.08357         
#>          8   0.6561 0.3122    0.04930 0.09860         
#>          9   0.6694 0.3388    0.04171 0.08343         
#>         10   0.6765 0.3531    0.04410 0.08820         
#>         15   0.6724 0.3449    0.04625 0.09250         
#>         30   0.7296 0.4592    0.05573 0.11147         
#>         47   0.7316 0.4633    0.05590 0.11180        *
#> 
#> The top 5 variables (out of 47):
#>    textoSaldoContaCorrente_X.....0., textoSaldoContaCorrente_X.0..200., valorSolicitadoCredito, textoSaldoContaCorrente_X.200...., textoInvestimento_X.1000....

set.seed(364525)
ldaRFE <- rfe(
  x = df[, names(df) != "indicadorInadimplente"],
  y = df$indicadorInadimplente,
  sizes = c(1:10, 15, 30),
  rfeControl = rfeControl(functions = ldaFuncs, method = "cv")
)
ldaRFE
#> 
#> Recursive feature selection
#> 
#> Outer resampling method: Cross-Validated (10 fold) 
#> 
#> Resampling performance over subset size:
#> 
#>  Variables Accuracy  Kappa AccuracySD KappaSD Selected
#>          1   0.6143 0.2286    0.03395 0.06789         
#>          2   0.6061 0.2122    0.04144 0.08287         
#>          3   0.6337 0.2673    0.03709 0.07418         
#>          4   0.6245 0.2490    0.04584 0.09167         
#>          5   0.6347 0.2694    0.04609 0.09218         
#>          6   0.6531 0.3061    0.03695 0.07390         
#>          7   0.6724 0.3449    0.04261 0.08521         
#>          8   0.6878 0.3755    0.03794 0.07587         
#>          9   0.6969 0.3939    0.03968 0.07936         
#>         10   0.7000 0.4000    0.03670 0.07339         
#>         15   0.7010 0.4020    0.03006 0.06012         
#>         30   0.7347 0.4694    0.03535 0.07070        *
#>         47   0.7265 0.4531    0.03495 0.06991         
#> 
#> The top 5 variables (out of 30):
#>    textoSaldoContaCorrente_X.....0., quantidadeMesContaCorrente, textoHistoricoCredito_conta.critica...outros.creditos.existentes..nao.neste.banco., textoSaldoContaCorrente_X.0..200., numeroIdade

# or 

ldaRFE <- rfe(
  indicadorInadimplente ~ .,
  data = df,
  sizes = c(1:10, 15, 30),
  rfeControl = rfeControl(functions = ldaFuncs, method = "cv")
)
ldaRFE
#> 
#> Recursive feature selection
#> 
#> Outer resampling method: Cross-Validated (10 fold) 
#> 
#> Resampling performance over subset size:
#> 
#>  Variables Accuracy  Kappa AccuracySD KappaSD Selected
#>          1   0.6143 0.2286    0.04634 0.09268         
#>          2   0.6071 0.2143    0.03915 0.07831         
#>          3   0.6429 0.2857    0.03818 0.07636         
#>          4   0.6286 0.2571    0.04594 0.09187         
#>          5   0.6439 0.2878    0.05452 0.10903         
#>          6   0.6612 0.3224    0.04659 0.09317         
#>          7   0.6714 0.3429    0.03593 0.07186         
#>          8   0.6786 0.3571    0.03341 0.06683         
#>          9   0.6949 0.3898    0.04288 0.08575         
#>         10   0.6837 0.3673    0.05022 0.10044         
#>         15   0.6969 0.3939    0.03880 0.07759         
#>         30   0.7306 0.4612    0.04388 0.08775         
#>         47   0.7316 0.4633    0.04787 0.09575        *
#> 
#> The top 5 variables (out of 47):
#>    textoSaldoContaCorrente_X.....0., quantidadeMesContaCorrente, textoHistoricoCredito_conta.critica...outros.creditos.existentes..nao.neste.banco., numeroIdade, textoGarantia_X

^{Created on 2020-01-02 by the reprex package (v0.3.0)}

Session info

``` r devtools::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 3.6.1 (2019-07-05) #> os macOS Mojave 10.14.6 #> system x86_64, darwin15.6.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/New_York #> date 2020-01-02 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0) #> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.0) #> callr 3.4.0 2019-12-09 [1] CRAN (R 3.6.0) #> caret * 6.0-84 2019-04-27 [1] CRAN (R 3.6.0) #> class 7.3-15 2019-01-01 [1] CRAN (R 3.6.0) #> cli 2.0.0 2019-12-09 [1] CRAN (R 3.6.0) #> codetools 0.2-16 2018-12-24 [1] CRAN (R 3.6.1) #> colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0) #> data.table 1.12.6 2019-10-18 [1] CRAN (R 3.6.0) #> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0) #> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.0) #> dials 0.0.4 2019-12-02 [1] CRAN (R 3.6.1) #> DiceDesign 1.8-1 2019-07-31 [1] CRAN (R 3.6.0) #> digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.1) #> dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.6.0) #> e1071 1.7-3 2019-11-26 [1] CRAN (R 3.6.0) #> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0) #> fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.0) #> foreach 1.4.7 2019-07-27 [1] CRAN (R 3.6.0) #> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0) #> generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0) #> ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.0) #> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0) #> gower 0.2.1 2019-05-14 [1] CRAN (R 3.6.0) #> GPfit 1.0-8 2019-02-08 [1] CRAN (R 3.6.0) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0) #> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0) #> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0) #> ipred 0.9-9 2019-04-28 [1] CRAN (R 3.6.0) #> iterators 1.0.12 2019-07-26 [1] CRAN (R 3.6.0) #> knitr 1.26 2019-11-12 [1] CRAN (R 3.6.0) #> lattice * 0.20-38 2018-11-04 [1] CRAN (R 3.6.1) #> lava 1.6.6 2019-08-01 [1] CRAN (R 3.6.0) #> lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.0) #> lhs 1.0.1 2019-02-03 [1] CRAN (R 3.6.0) #> lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.0) #> lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.0) #> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0) #> MASS 7.3-51.5 2019-12-20 [1] CRAN (R 3.6.0) #> Matrix 1.2-18 2019-11-27 [1] CRAN (R 3.6.0) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0) #> ModelMetrics 1.2.2 2018-11-03 [1] CRAN (R 3.6.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0) #> nlme 3.1-140 2019-05-12 [1] CRAN (R 3.6.1) #> nnet 7.3-12 2016-02-02 [1] CRAN (R 3.6.0) #> parsnip 0.0.4.9000 2019-12-14 [1] local #> pillar 1.4.3 2019-12-20 [1] Github (r-lib/pillar@e2e7926) #> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0) #> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0) #> plyr 1.8.5 2019-12-10 [1] CRAN (R 3.6.0) #> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0) #> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0) #> prodlim 2019.11.13 2019-11-17 [1] CRAN (R 3.6.0) #> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0) #> purrr 0.3.3 2019-10-18 [1] CRAN (R 3.6.0) #> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.0) #> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.0) #> recipes 0.1.8 2019-12-18 [1] local #> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0) #> reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.6.0) #> rlang 0.4.2 2019-11-23 [1] CRAN (R 3.6.0) #> rmarkdown 2.0 2019-12-12 [1] CRAN (R 3.6.0) #> rpart 4.1-15 2019-04-12 [1] CRAN (R 3.6.0) #> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0) #> scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0) #> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0) #> survival 2.40-1 2016-10-30 [1] CRAN (R 3.6.1) #> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.0) #> tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0) #> tidyr 1.0.0 2019-09-11 [1] CRAN (R 3.6.0) #> tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0) #> timeDate 3043.102 2018-02-21 [1] CRAN (R 3.6.0) #> usethis 1.5.1.9000 2019-12-18 [1] Github (r-lib/usethis@b2e894e) #> vctrs 0.2.1 2019-12-17 [1] CRAN (R 3.6.1) #> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0) #> workflows 0.1.0 2019-12-30 [1] CRAN (R 3.6.1) #> xfun 0.11 2019-11-12 [1] CRAN (R 3.6.0) #> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0) #> zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0) #> #> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library ```

rnmourao commented 4 years ago

It worked! Thanks! I'll check my code again.

topepo / caret

RFE - Can't use with Logistic Regression: Error in {: task 1 failed - "undefined columns selected" #1091