smartdata-analysis-and-statistics / precmed

A doubly robust precision medicine approach to estimate and validate conditional average treatment effects
https://smartdata-analysis-and-statistics.github.io/precmed/
Apache License 2.0
4 stars 0 forks source link

Errors and warnings of main functions #19

Closed StanWijn closed 1 year ago

StanWijn commented 1 year ago

Two main errors/warnings still occur. Check with BioGen if we should resolve these.

1 ) Error in cv_surv() when estimating ATE in nested subgroups using "poisson", "randomForest".

Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
return NAs in the corresponding subgroup.
Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest"

The consecutive functions do not seem affected by these errors (plot, abc, outcomes). Is this really an error or can we reduce or resolve this error in any way?

Example:

library(precmed)
tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95)))
output_cv2 <- cv(response = "survival",
                 cate.model = survival::Surv(y, d) ~ age +
                                                     female +
                                                     previous_cost +
                                                     previous_number_relapses,
                 ps.model = trt ~ age + previous_treatment,
                 ipcw.model = ~ age + previous_cost + previous_treatment,
                 data = survivalExample,
                 score.method = c("poisson", "randomForest"),
                 followup.time = NULL,
                 tau0 = tau0,
                 surv.min = 0.025,
                 higher.y = TRUE,
                 cv.n = 5,
                 initial.predictor.method = "randomForest",
                 plot.gbmperf = FALSE,
                 seed = 999)

Output:
  |                                                                                                            |   0%
cv = 1 
  splitting the data..
  training..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
  validating..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Wed Aug 31 17:01:04 2022 
  |======================                                                                                      |  20%
cv = 2 
  splitting the data..
  training..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
  validating..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Wed Aug 31 17:01:23 2022 
  |===========================================                                                                 |  40%
cv = 3 
  splitting the data..
  training..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
  validating..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Wed Aug 31 17:01:42 2022 
  |=================================================================                                           |  60%
cv = 4 
  splitting the data..
  training..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
  validating..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Wed Aug 31 17:02:01 2022 
  |======================================================================================                      |  80%
cv = 5 
  splitting the data..
  training..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
  validating..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Wed Aug 31 17:02:21 2022 
  |============================================================================================================| 100%

2) Warnings regarding the conversion of drug0 and drug1 to 0/1

data.preproc will convert the variable trt to 0/1 from drug0 and drug1. This is probably done with interpretation in mind (on line 314 of utility_count.R this check / conversion is performed). Do we want to show this warning in the examples?

Example:

 output_cv <- cv(response = "count",
                 cate.model = y ~ age + female + previous_treatment +
                                      previous_cost + previous_number_relapses + offset(log(years)),
                 ps.model = trt ~ age + previous_treatment,
                 data = countExample,
                 higher.y = FALSE,
                 score.method = "poisson",
                 cv.n = 5,
                 plot.gbmperf = FALSE,
                 seed = 999)

Output:
cv = 1 
  splitting the data..
  training..
  validating..

cv = 2 
  splitting the data..
  training..
  validating..

cv = 3 
  splitting the data..
  training..
  validating..

cv = 4 
  splitting the data..
  training..
  validating..

cv = 5 
  splitting the data..
  training..
  validating..

Total runtime : 13.28 secs 
Warning message:
In data.preproc(fun = "cv", cate.model = cate.model, ps.model = ps.model,  :
  Variable trt was recoded to 0/1 with drug0->0 and drug1->1.
NightlordTW commented 1 year ago

would it be possible to replace the "error" message by a warning clarifying the problem? Do we need to worry about the output? @phoebejiang

NightlordTW commented 1 year ago

@StanWijn Lets update the examples so that the treatment coding is already in the preferred format

gabriellesimoneau commented 1 year ago

Yes, it's confusing to have a message showing "Error(s) occurred..." and not have the function stopped. We are forcing the function to continue when there are errors inside the CV iterations (e.g., too few events in Cox, etc) and want to tell the user that errors happen. Instead, we can output this as "Warning: Error(s) occurred...". Same thing for "Warning(s) occurred...", it can be written as "Warning: Warning(s) occurred..."

NightlordTW commented 1 year ago

TODO: lets change the function so that a warning is cast at the end of the procedure.

no_errors<- 0
# Cross validation
for () {
fit <- tryCatch(estimate(...))
if ("warning" %in% class(fit)) {
 no_errors <- no_errors + 1
}
} # end cross validation

if (no_errors > 0) {
warning("XXX folds were not able to run")
}