Hi, thanks for writing this! Huge fan of the package.
I would like to create the minimal necessary object for prediction. I would like to share an rfsrc object so that it can be used for prediction by someone else, but I am dealing with sensitive data that cannot be shared and thus need to strip away all values that were used for prediction. Is this possible?
library(randomForestSRC)
library(dplyr)
data(pbc, package = "randomForestSRC")
# using all information
train <- sample(1:nrow(pbc), round(nrow(pbc) * .7))
obj <- rfsrc(Surv(days, status) ~ .,
data = pbc[train,])
pred.hasy <- predict(obj, pbc[-train,])
head(pred.hasy$yvar)
head(pred.hasy$survival)
pbc.test <- pbc[-train,]
pbc.test <- pbc.test %>% select(-days, -status)
pred.noy <- predict(obj, pbc.test)
head(pred.noy$yvar)
head(pred.noy$survival)
# this works but still seems to have xvar data in obj.trim$forest
obj.trim <- obj
obj.trim$xvar <- NULL
pred.trim <- predict(obj.trim, pbc.test)
head(pred.trim$survival)
# this does not work
obj.trim2 <- obj.trim
obj.trim2$forest$xvar <- NULL
pred.trim2 <- predict(obj.trim2, pbc.test)
The error message seems to indicate that the xvar is used to get information on factor levels. Is there some way to pass a single (synthetic) row of data there that contains the relevant levels? Are training data values stored anywhere else? Is there a more minimal version of the object that would contain enough information for prediction?
EDIT: I also want to do this to create as small of a file as possible for sharing purposes. I'd ideally like to be able to share the object via github but the resulting file is currently way too large.
Really grateful for your time. Thank you for any help!
Hi, thanks for writing this! Huge fan of the package.
I would like to create the minimal necessary object for prediction. I would like to share an rfsrc object so that it can be used for prediction by someone else, but I am dealing with sensitive data that cannot be shared and thus need to strip away all values that were used for prediction. Is this possible?
The error message seems to indicate that the
xvar
is used to get information on factor levels. Is there some way to pass a single (synthetic) row of data there that contains the relevant levels? Are training data values stored anywhere else? Is there a more minimal version of the object that would contain enough information for prediction?EDIT: I also want to do this to create as small of a file as possible for sharing purposes. I'd ideally like to be able to share the object via github but the resulting file is currently way too large.
Really grateful for your time. Thank you for any help!