topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.62k stars 632 forks source link

ranger now accepts weights #414

Closed LluisRamon closed 8 years ago

LluisRamon commented 8 years ago

Hi Max,

I have seen that package ranger now accepts weights using parameter case.weights. It is included in CRAN version 0.4. .

If I use case.weights inside train dots, it gives me an error. I created a custom function like the one below and it seems to work fine. If no case.weights, ranger expects NULL as in train, this is why I include wts directly to ranger.

rangerWeight <- getModelInfo("ranger")$ranger

rangerWeight$fit <- function (x, y, wts, param, lev, last, classProbs, ...) 
{
  if (!is.data.frame(x)) 
    x <- as.data.frame(x)
  x$.outcome <- y
  out <- ranger(.outcome ~ ., data = x, mtry = param$mtry, 
                write.forest = TRUE, probability = classProbs, case.weights = wts, ...)
  if (!last) 
    out$y <- y
  out
}

Not sure if this is a feature requesting weights in ranger or a bug when I use them inside dots.

If you need a reproducible example of the error or a pull request to method ranger I'll be happy to provide them.

Thank you very much,

topepo commented 8 years ago

Just passing weights won't work since that isn;t the resampled version of the values (and the dimensions don't match the data). I've checked in a change that allows it:

> library(caret)
> 
> set.seed(1)
> dat <- twoClassSim(100)
> 
> set.seed(2)
> with_weights <- train(Class ~ ., data = dat, method = modelInfo, weights = (1:100)/100)
> set.seed(2)
> no_weights <- train(Class ~ ., data = dat, method = modelInfo)
> 
> with_weights
Random Forest 

100 samples
 15 predictor
  2 classes: 'Class1', 'Class2' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 100, 100, 100, 100, 100, 100, ... 
Resampling results across tuning parameters:

  mtry  Accuracy   Kappa    
   2    0.6906904  0.1830998
   8    0.7085037  0.2774630
  15    0.7004598  0.2775892

Accuracy was used to select the optimal model using  the largest value.
The final value used for the model was mtry = 8. 
> no_weights
Random Forest 

100 samples
 15 predictor
  2 classes: 'Class1', 'Class2' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 100, 100, 100, 100, 100, 100, ... 
Resampling results across tuning parameters:

  mtry  Accuracy   Kappa    
   2    0.6870326  0.1888561
   8    0.7105888  0.2957754
  15    0.7121800  0.3166938

Accuracy was used to select the optimal model using  the largest value.
The final value used for the model was mtry = 15. 
LluisRamon commented 8 years ago

Hi Max,

Now I get why passing weights directly didn't work. Thanks for the explanation.

I have seen in commit ed14146f714c8339005770080e2772a2e05ae952 that ranger method now accepts class weights, so I close the issue.

Thank you very much.

njain007 commented 5 years ago

Hi, Sorry, I didn't get the previous explanation. I am very new to R. Could you tell me how to rectify my code below. The weights are in the range of 1 to 4000 and not normalised. It is a survey data. I am getting this error "Error in rangerCpp(treetype, dependent.variable.name, data.final, variable.names, : Not compatible with requested type: [type=character; target=double]." Thanks.

hyper_grid<- expand.grid( mtry = seq(10, 310, by = 50), node_size = seq(3, 9, by = 2),

sampe_size = c(0.55, 0.632, 0.70, 0.80),

OOB_RMSE = 0 )

for(i in 1:nrow(hyper_grid)){

train model

model <- ranger( formula = CS4_pvt ~.-WT, case.weights = "WT", data = traindata1, num.trees = 1491, mtry = hyper_grid$mtry[i], min.node.size = hyper_grid$node_size[i], importance = "impurity", seed = 123456 )

add OOB error to grid

hyper_grid$OOB_RMSE[i] <- sqrt(model$prediction.error) }

njain007 commented 5 years ago

Hi Max,

Now I get why passing weights directly didn't work. Thanks for the explanation.

I have seen in commit ed14146 that ranger method now accepts class weights, so I close the issue.

Thank you very much.

Hi, Could you please explain how did you solve the issue with using case weights. Sorry, I didn't understand the explanation. Could you please help me resolving the error below. WT2 is in decimals.

random_forest_govt2 <- ranger(CS4_govt ~ CS22 + CS23 + TA10A + Nchild_adult + Income_person

I get an error - Error in rangerCpp(treetype, dependent.variable.name, data.final, variable.names, : Not compatible with requested type: [type=character; target=double].