swager / randomForestCI

This package is DEPRECATED. Please use the packages `grf` or `ranger` instead, which have built-in confidence intervals.
https://github.com/swager/grf
MIT License
69 stars 21 forks source link

NAs produced when converting classification predictions to numeric #3

Open mattmills49 opened 8 years ago

mattmills49 commented 8 years ago

In the infinitesimalJackknife.R the following code is supposed to convert a classification prediction to numeric values:

predictions = predict(rf, newdata, predict.all = TRUE)
pred = predictions$individual
# in case of classification, convert character labels to numeric (!)
class(pred) = "numeric"

However this produces NAs when I try it locally:

class_matrix <- matrix(sample(c("Yes", "No"), size = 30, replace = T, prob = c(.3, .7)), nrow = 10)
head(class_matrix)
#>      [,1]  [,2]  [,3] 
#> [1,] "No"  "No"  "No" 
#> [2,] "No"  "No"  "Yes"
#> [3,] "No"  "No"  "No" 
#> [4,] "No"  "No"  "No" 
#> [5,] "Yes" "Yes" "Yes"
#> [6,] "Yes" "Yes" "Yes"
class(class_matrix) = "numeric"
#> Warning in class(class_matrix) = "numeric": NAs introduced by coercion
head(class_matrix)
#>      [,1] [,2] [,3]
#> [1,]   NA   NA   NA
#> [2,]   NA   NA   NA
#> [3,]   NA   NA   NA
#> [4,]   NA   NA   NA
#> [5,]   NA   NA   NA
#> [6,]   NA   NA   NA

Am I missing something of what it is supposed to do? This is a very dirty/quick way to properly convert a character matrix to numeric:

class_matrix <- matrix(sample(c("Yes", "No"), size = 30, replace = T, prob = c(.3, .7)), nrow = 10)
numeric_matrix <- 1 * (class_matrix == class_matrix[1,1])
head(numeric_matrix)
#>      [,1] [,2] [,3]
#> [1,]    1    1    0
#> [2,]    0    1    0
#> [3,]    1    1    0
#> [4,]    1    1    0
#> [5,]    0    0    0
#> [6,]    0    0    0
swager commented 7 years ago

Sorry for taking so long to get back to you on this; it had fallen off my radar last year. Yes, this definitely seems like a real bug, and needs to be fixed.

I think the multi-class function currently being added by @alionaBER also fixes this issue; so maybe it would make sense to just have a single, separate function for classification (that also does multiclass) and not try to combine it with regression?