robingenuer / VSURF

Variable Selection Using Random Forests
36 stars 10 forks source link

Error: mtry can not be larger than the number of variables in data #10

Open mickeycampbell opened 4 months ago

mickeycampbell commented 4 months ago

Hello,

Big fan of VSURF here! Thanks for your excellent work.

However, I am running into an issue that I have not previously encountered in my many uses of your function:

Error: mtry can not be larger than number of variables in data. Ranger will EXIT now.
Error in ranger::ranger(dependent.variable.name = "y", data = dat, num.threads = ifelse(parallel,  : 
User interrupt or internal error.

The dataset I am using initially (linked below) has 61 rows and 157 columns (one response column and 156 candidate predictor columns).

vsurf_problem.csv

Here's my code:

library(VSURF)
library(ranger)

# read in the data
df <- read.csv("vsurf_problem.csv")

# define response (y) and predictor (x) columns
x.cols <- 2:ncol(df)
y.col <- 1

# run vsurf
v <- VSURF(
  x = df[,x.cols],
  y = df[,y.col],
  RFimplem = "ranger",
  parallel = T,
  ncores = 18
)

It makes it through the thresholding and interpretation steps, but appears to crash on the prediction step:

Thresholding step
Estimated computational time (on one core): 3 sec.

Interpretation step (on 133 variables)
Estimated computational time (on one core): between 0 sec. and  39.9 sec.

Prediction step (on 64 variables)
Maximum estimated computational time (on one core): 0 sec.
|============================================================================================     |  95%
Error: mtry can not be larger than number of variables in data. Ranger will EXIT now.
Error in ranger::ranger(dependent.variable.name = "y", data = dat, num.threads = ifelse(parallel,  : 
User interrupt or internal error.

Here's information on my R installation:

platform       x86_64-w64-mingw32               
arch           x86_64                           
os             mingw32                          
crt            ucrt                             
system         x86_64, mingw32                  
status                                          
major          4                                
minor          4.0                              
year           2024                             
month          04                               
day            24                               
svn rev        86474                            
language       R                                
version.string R version 4.4.0 (2024-04-24 ucrt)
nickname       Puppy Cup   

I am using VSURF version 1.2.0. Thanks in advance for any help you can provide!

mickeycampbell commented 4 months ago

I should add -- I switched: RFimplem = "ranger" to RFimplem = "randomForest" and it works without issue. So, perhaps there is some internal mtry defaults specifically for the ranger implementation that are causing problems?

robingenuer commented 4 months ago

Hi @mickeycampbell, Thanks a lot for your usage of VSURF, and especially to detect this bug. The value of mtry was indeed too high in some cases: it is now fixed on the github development version, that you can test with:

remotes::install_github("robingenuer/VSURF")

As a matter of fact, the bug was also appearing with RFimplem = "randomForest" but was depending on the random seed. Thanks for improving the package ! Robin