refunders / refund

Regression with functional data
39 stars 23 forks source link

pfr(): predict not working for fpc terms #63

Open fabian-s opened 8 years ago

fabian-s commented 8 years ago
m <- pfr(pasat ~ fpc(rcst), data=DTI[complete.cases(DTI),][1:100,])
predict(m, newdata = DTI[complete.cases(DTI),][-(1:100),])
# Error in eval(expr, envir, enclos) : object 'X.tmat' not found
# In addition: Warning message:
# In (function (object, newdata, type = "link", se.fit = FALSE, terms = NULL,  :
#  not all required variables have been supplied in  newdata!

@jgellar : sorry to keep filing bugs against your code, but not being able to generate predictions really sucks.... something like this may help

sbrockhaus commented 7 years ago

The same problem occurs for lf.vd() terms. I want to use a model with variable-domain covariate for binary response. To asses prediciton accuracy, out-of-bag prediciton is inevitable.

library(refund)
data(sofa)
fit.vd1 <- pfr(death ~ lf.vd(SOFA) + age + los, family="binomial", data=sofa)
pred <- predict(fit.vd1, newdata = sofa)
# Error in eval(expr, envir, enclos) : object 'SOFA.arg' not found

A workaround is to use weights:

## fit the model using weights 
train_ind <- sample(0:1, size = nrow(sofa), replace=TRUE)
fit_train <- pfr(death ~ lf.vd(SOFA) + age + los, family="binomial", data=sofa, 
                 weights = train_ind)
## only keep the predictions with weight 0
pred_oob <- predict(fit_train, type = "response")[train_ind == 0]

But this is rather tedious... And I am not sure, how the data with weight 0 enter the model anyway. Consider the following model fit where the training data is used instead of using weights. Thus, the models fit_train and fit_train_data should be equivalent.

## compare the model fit with weights to the model fit on the training data only
train_data <- sofa[train_ind == 1, ]
fit_train_data <- pfr(death ~ lf.vd(SOFA) + age + los, family="binomial", data=train_data) 
## the two models should be equivalent, but e.g. the means differ 
fit_train$pfr$datameans
fit_train_data$pfr$datameans