tagteam / prodlim

Product limit estimation for survival analysis
7 stars 9 forks source link

`predictList()` breaks with a `tibble` #8

Closed hfrick closed 7 months ago

hfrick commented 7 months ago

Over in https://github.com/tidymodels/tune/issues/824 we (👋 @joranE) ran into a curious bug: predict.prodlim() works well with a data.frame but errors with a tibble. The reprex uses pec because I wasn't sure how to quickly translate that to prodlim but opening the report here since the issue seems to be in prodlim.

library(pec)
#> Loading required package: prodlim
library(survival)

lung <- lung[sample(nrow(lung),size = 10000,replace = TRUE),]

lung_train <- lung[-c(1:5000), ]
lung_test <- lung[1:5000, ]

tree_fit <- pec::pecRpart(Surv(time, status) ~ ., data = lung_train)
pred <- pec::predictSurvProb(tree_fit, newdata = lung_test, times = 100)

# changing the test data to a tibble causes an error
test_tibble <- tibble::as_tibble(lung_test)
pred <- pec::predictSurvProb(tree_fit, newdata = test_tibble, times = 100)
#> Error in do.call("rbind", psurv): variable names are limited to 10000 bytes

# traceback()
# 8: do.call("rbind", psurv)
# 7: predictSurv(object = object, times = times, newdata = newdata,
#                level.chaos = level.chaos, mode = mode, bytime = bytime)
# 6: predict.prodlim(object = object, type = "surv", newdata = newdata,
#                    times = times, mode = "matrix", level.chaos = 1)
# 5: predict(object = object, type = "surv", newdata = newdata, times = times,
#            mode = "matrix", level.chaos = 1)
# 4: predictSurvProb.prodlim(object$survfit, newdata = newdata, times = times)
# 3: predictSurvProb(object$survfit, newdata = newdata, times = times)
# 2: predictSurvProb.pecRpart(tree_fit, newdata = test_tibble, times = 100)
# 1: pec::predictSurvProb(tree_fit, newdata = test_tibble, times = 100)

Created on 2024-01-23 with reprex v2.0.2

It breaks in predictSurv() due to names but the issue really seems to be that

https://github.com/tagteam/prodlim/blob/3919027bd9b3756cf99a04ef9708f788c56d3d40/R/predict.prodlim.R#L264-L268

collapses the names when it (probably) shouldn't. They look like

"rpartFactor=c(`1` = 26, `2` = 11, `3` = 7, `4` = 13, `5` = 23, `6` = 8, `7` = 36, `8` = 8, `9` = 10, `10` = 10, `11` = 9, `12` = 19, `13` = 13, `14` = 25, `15` = 10, `16` = 30, `17` = 18, `18` = 12, `19` = 8, `20` = 9, `21` = 7, `
tagteam commented 7 months ago

thanks a lot for pointing this out. I have just pushed a patch.

hfrick commented 7 months ago

Thank you! 🙌

joranE commented 7 months ago

Just confirmed that the update you pushed fixed the example I files in the issue at {censored}. Thanks for the quick turnaround to everyone!