Closed bcjaeger closed 2 years ago
Just wanted to share an update on the rOpenSci review. I did think it was about to be closed but yesterday I was assigned two more reviewers. My bad - it looks like the rOpenSci review will continue for at least another few weeks.
I am very excited to see what the new reviewers think and I will do everything I can to incorporate their feedback in a way that is compatible with the version of aorsf
on CRAN. I'll let you know asap if breaking changes come up and I'll update this PR accordingly if they do.
Thanks for the great review! The changes you described will definitely make aorsf
more user-friendly, and I'm happy to do that. I will get things updated in aorsf
first and then work on the comments above.
Happy to share updates in aorsf
based on your thoughts above. One follow-up question:
An option to allow NA values for out-of-range predictions instead of an error.
Were you referring to the case where times
in predict()
has at least one value over the maximum observed follow-up time? If so, great! That is allowed now with check_boundaries
set to FALSE
. The returned values are currently not NA
(explanation in reprex comments).
Thank you again for the thoughts you shared - the changes definitely make the API friendlier. I'm happy to make more modifications in aorsf
if that is helpful. Otherwise, I will work on updates in the censored
code.
suppressPackageStartupMessages({
library(aorsf)
library(tidyverse)
})
# Allowing the standard flexibility of Surv() wrt the values of the
# event indicator.
pbc_lgl <- mutate(pbc_orsf, status = as.logical(status))
pbc_12 <- mutate(pbc_orsf, status = status + 1)
set.seed(329)
fit_standard <- orsf(pbc_orsf, Surv(time, status) ~ . - id)
set.seed(329)
fit_lgl <- orsf(pbc_lgl, formula = Surv(time, status) ~ . - id)
set.seed(329)
fit_12 <- orsf(pbc_12, formula = Surv(time, status) ~ . - id)
all.equal(fit_standard$forest, fit_lgl$forest)
#> [1] TRUE
all.equal(fit_standard$forest, fit_12$forest)
#> [1] TRUE
# An option for how missing values are handled for predictions: For example,
# predict.lm() has an argument na.action which determines what happens with
# missing values. aorsf could still default to an error but in censored we
# could request that it pads the results with NA. See also ?na.fail.
pbc_first10 <- pbc_na <- slice(pbc_orsf, 1:10)
pbc_na$age[c(1,5,10)] <- NA
# consistent w/tidymodels
predict(fit_standard, pbc_na, na_action = 'pass')
#> [,1]
#> [1,] NA
#> [2,] 0.10615858
#> [3,] 0.77370799
#> [4,] 0.43724275
#> [5,] NA
#> [6,] 0.08677136
#> [7,] 0.06302206
#> [8,] 0.25354362
#> [9,] 0.96007034
#> [10,] NA
# consistent w/base R
predict(fit_standard, pbc_na, na_action = 'omit')
#> [,1]
#> [1,] 0.10615858
#> [2,] 0.77370799
#> [3,] 0.43724275
#> [4,] 0.08677136
#> [5,] 0.06302206
#> [6,] 0.25354362
#> [7,] 0.96007034
# An option to allow NA values for out-of-range predictions
# instead of an error.
## I allow out of range predictions but, instead of returning NA when
## boundary_checks is FALSE, return prediction at the max observed time
## (notice column 2 (t = 10,000) is the same as 3 (t = 100,000))
## I did this to be consistent with the way I implemented boundary_checks
## in the orsf_pd functions and also just in case a user really wants
## to make predictions this way.
predict(fit_standard,
new_data = pbc_first10,
boundary_checks = FALSE,
pred_horizon = c(1000, 10000, 100000))
#> [,1] [,2] [,3]
#> [1,] 0.91966955 0.9870598 0.9870598
#> [2,] 0.04281353 0.5127637 0.5127637
#> [3,] 0.40236198 0.9077745 0.9077745
#> [4,] 0.22127622 0.8731020 0.8731020
#> [5,] 0.09070765 0.7597322 0.7597322
#> [6,] 0.02118349 0.4610088 0.4610088
#> [7,] 0.02227554 0.5664215 0.5664215
#> [8,] 0.09007197 0.8394871 0.8394871
#> [9,] 0.88790002 0.9849913 0.9849913
#> [10,] 0.04862403 0.6593526 0.6593526
# Allowing any order of time points for prediction, and sorting those
# internally to orsf().
predict(fit_standard,
new_data = pbc_first10,
pred_horizon = c(1000, 500, 100))
#> [,1] [,2] [,3]
#> [1,] 0.91966955 0.785869003 0.142670246
#> [2,] 0.04281353 0.010758594 0.000000000
#> [3,] 0.40236198 0.135624711 0.016068543
#> [4,] 0.22127622 0.095856957 0.014200541
#> [5,] 0.09070765 0.027139137 0.001000000
#> [6,] 0.02118349 0.005302408 0.000000000
#> [7,] 0.02227554 0.010475093 0.000000000
#> [8,] 0.09007197 0.025564204 0.001844517
#> [9,] 0.88790002 0.744081557 0.270957115
#> [10,] 0.04862403 0.017911853 0.001200000
Created on 2022-09-04 by the reprex package (v2.0.1)
todo for myself: add tests on case weights
Thank you! I will review the examples you shared and open a PR on parsnip.
I noticed that my use of deparse1
causes our tests to fail on oldrel-3
since deparse1
was introduced in R 4.0.0. I think as.character
can be used instead of deparse1
, and will make sure this fix is included in 0.0.3.
Great, thanks!
re deparse1()
: if you'd like some inspiration, this is how it's being dealt with in testthat: https://github.com/r-lib/testthat/blob/426eb523f6772c76175d6e8d7ca04355b1ee7ad5/R/utils.R#L53
and there is also https://github.com/r-lib/backports
Howdy! I just want to make sure I have not forgotten anything on my list. I have
parsnip
aorsf
0.0.3 to CRANThere is no rush to complete this, I just want to make sure I'm not holding it up. If there is anything else I can do, I would be happy to help.
Awesome, thank you! I'll get to this soon and also resolve the merge conflicts π
Great =] Thank you!
Getting closer @bcjaeger ! Once the parsnip PR is good to go, this one is as well π
Howdy! Just wanted to let you know aorsf
0.0.4 is on its way to CRAN and should be available soon. (It includes support for Surv
objects in the formula.)
Thank you!
Thank you for the great PR! π
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
Hello! Following up on #172,
aorsf
has made it to CRAN and its API is stable.1aorsf
that were based on thepartykit
templatesaorsf
One possible concern is that
aorsf
fit and predict functions are designed to throw hard errors when the input data have missing values. This may not be in line with the design ofparsnip
predict functions, which propagate rows with missing data into the returned object as a missing value for the prediction. Would you like me to write some additional code into theaorsf
wrappers for prediction to make it more consistent with theparsnip
design?A second possible concern is that
aorsf
predict functions don't let users compute predictions for times that are greater than the maximum observed time in the model's training data. E.g., for thelung
data, the max time is 1022, so trying to predict a survival probability at time of 1023 or higher would trigger an error. Would you like me to write some additional code to prevent that sort of error from triggering?1In #172, we agreed to consider this pull request when
aorsf
was on CRAN and the rOpenSci review was done. Technically, my review is still open at rOpenSci, but the review is finished in the sense that all updates toaorsf
based on the reviewer feedback are complete. The issue is still open but the editor has let me know that it will be wrapped up in a week or so.