Add flexsurvspline support for survival_reg model

mattwarkentin commented 2 years ago

Hi @hfrick,

This PR is a start for addressing https://github.com/tidymodels/censored/issues/115.

I have marked the PR as a draft because I think some changes to {parsnip} are required before this PR should be merged into {censored}.

I look forward to your feedback and hopefully getting this integrated into censored eventually.

hfrick commented 2 years ago

Hi @mattwarkentin , thanks for the PR!

Which changes in parsnip are you referring to? Let me know when you want feedback on the PR! (Now?)

mattwarkentin commented 2 years ago

I am somewhat new to contributing to parsnip-adjacent packages, but I thought that an update to the parsnip documentation would be important for showing the flexsurvspline engine and tunable arguments in the documentation (i.e., num_knots, survival_link).

Basically flexsurvspline versions of:

But maybe I'm wrong.

Sure, happy to receive feedback on the PR any time!

mattwarkentin commented 2 years ago

Sorry for dropping the ball on this. I was moving across the country for a new job and just totally got sidetracked. Want me to get this back on track, @hfrick?

hfrick commented 2 years ago

No worries, there is no rush! If you want, that'd be great! The main things are the unit tests and the documentation over in parsnip.

mattwarkentin commented 2 years ago

For me:

[x] add unit tests to censored
[x] PR to parsnip with documentation and making the engine args tunable (I left a more detailed comment on this on the code directly)
[x] Update vignette/articles/examples.Rmd
[x] Update README.Rmd
[x] update NEWS, including PR number and contributor github name

mattwarkentin commented 2 years ago

Okay @hfrick, we are getting close. The one thing I'm stumbling on is that there surely needs to be somewhere where we connect the fact that OUR parameter is called num_knots, while flexsurv::flexsurvspline() uses k.

I thought maybe I handled this when I made the PR to dials, but it doesn't look like it. Where do we make that mapping?

This seems like something handled by parsnip::set_model_arg(), but decided to remove that above...how do we do this??

hfrick commented 2 years ago

Awesome!

If the argument you want to make tunable is a main argument, ie an argument directly to survival_reg(), you use set_model_arg() like you initially tried. Since k is an engine-specific argument, it's more subtle. This happens in the tunable() method for survival_reg() (which sits in parsnip) - you already modified the right things, the last missing bit was just to use the name of the arg as it is used in flexsurv. Then you have your link to dials and the tidymodels machinery can work. The changes you made in dials were to create the parameter object (tidymodels uses that to get possible parameter values for tuning). With the change in parsnip you link that to the engine argument and the machinery is connected.

Thanks for adding the tests! Could you update them so that they make use of the spline functionality? Then we know that this aspect also works! For survival probability and hazard, we don't need to test against rms if we test against flexsurv.

Re predictions of the linear predictor: I noticed in the test example that flexsurv returns negative values and censored then tries to log those, resulting in NaN. For flexsurvreg(), predictions by flexsurv are exp(x * beta), which is why censored logs them before returning them as predictions of type linear_pred (so that it's x * beta). How is that with flexsurvspline()? What exactly does it return, does it make sense to log here? See reprex below.

The main branch has changed a lot since this branch got started but we are not seeing any merge conflicts - maybbe that's because it's a draft PR so just as a heads up that we might still encounter some.

library(flexsurv)
#> Loading required package: survival

spline_fit <- flexsurvspline(
  Surv(time, status) ~ age + sex,
  data = lung
)
predict(spline_fit, lung[1:5,], type = "linear")
#> # A tibble: 5 × 2
#>   .time .pred_link
#>   <dbl>      <dbl>
#> 1     0      -7.63
#> 2     0      -7.72
#> 3     0      -7.92
#> 4     0      -7.90
#> 5     0      -7.85

non_spline_fit <- flexsurvreg(
  Surv(time, status) ~ age + sex,
  data = lung, dist = "weibull"
)
predict(non_spline_fit, lung[1:5,], type = "linear")
#> # A tibble: 5 × 2
#>   .time .pred_link
#>   <dbl>      <dbl>
#> 1     0       314.
#> 2     0       338.
#> 3     0       392.
#> 4     0       387.
#> 5     0       373.

^{Created on 2022-10-28 with reprex v2.0.2}

mattwarkentin commented 2 years ago

Could you update them so that they make use of the spline functionality? Then we know that this aspect also works! For survival probability and hazard, we don't need to test against rms if we test against flexsurv.

Tests are updated and removed the rms comparison.

How is that with flexsurvspline()?

A flexsurvspline model is just a flexsurvreg model with a different distribution so the predictions are made the same way with predict.flexsurvreg() which ultimately relies on the machinery of summary.flexsurvreg(). So unless something is wrong, they should be returning comparable statistics, just on a different scale. Am I misunderstanding?

mattwarkentin commented 2 years ago

For flexsurvreg(), predictions by flexsurv are exp(x beta), which is why censored logs them before returning them as predictions of type linear_pred (so that it's x beta).

Are we sure this is the case? Maybe we should loop in @chjackson and get his input.

chjackson commented 2 years ago

Yes in flexsurv, predict(...,type="linear") returns the "fitted values of the location parameter" - understood as being on the natural scale, not logged. Perhaps I should disambiguate this doc.

hfrick commented 2 years ago

Thanks @chjackson! So just to be super clear: this applies to the predict methods for both the models from flexsurvreg() and flexsurvspline()?

Where I'm coming from with this question: For the existing engine, which uses flexsurvreg(), censored logs the prediction values returned from flexsurv because it standardizes across engines and most other engines return on that scale. So now I'm asking if censored needs to do the same for this new flexsurvspline engine?

chjackson commented 2 years ago

Both flexsurvspline and flexsurvreg return objects of class "flexsurvreg", so the same predict method will be used.

Is there a confusion here since flexsurv models can be based on different parametric survival distributions? The meaning of "location parameter" depends on the distribution. Sometimes this parameter is defined to be positive (such as rate and scale parameters in the exponential, Weibull or gamma), and sometimes it is unrestricted (such as meanlog in the log-normal, and the gamma0 parameter in flexsurvspline). So the "natural scale" of the parameter could either be positive or unrestricted. The "transformed scale" used for estimation is the log scale for positive parameters, and the same as the natural scale for unrestricted parameters.

I guess flexsurv conflicts here with user expectations that predict(...,type="linear") methods should always return quantities on the unrestricted scale?

chjackson commented 2 years ago

FYI if this is helpful, for a parameter named "parname" in a fitted flexsurv model x, the function x$dlist$transforms[["parname"]]() transforms a parameter value from its natural scale to the unrestricted scale. This is nearly always either log or the identity transformation, depending on the model. x$dlist$location contains the name of the location parameter.

mattwarkentin commented 2 years ago

@hfrick You may wish to use the functions in x$dlist$inv.transforms to get location parameters on the unrestricted scale. As Chris mentioned, it is often either identity() or log().

chjackson commented 2 years ago

No it's x$dlist$transforms to go from the natural scale to the unrestricted scale, and x$dlist$inv.transforms to go the other way.

mattwarkentin commented 2 years ago

Oops, my mistake. Please ignore my previous comment.

hfrick commented 2 years ago

@mattwarkentin @chjackson Thank you both; that's really helpful to know! I've now changed the transformation for the link/linear predictor in this commit https://github.com/tidymodels/censored/pull/213/commits/31fd6b9bad00317fc5077d30a0bbd66382e4730a - could you confirm that this is correct now?

Other than that, I think this PR is ready!

chjackson commented 2 years ago

Can't see anything wrong with the transformation procedure there

mattwarkentin commented 2 years ago

LGTM!

hfrick commented 2 years ago

Great, thanks so much both!

github-actions[bot] commented 2 years ago

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

tidymodels / censored

Add flexsurvspline support for survival_reg model #213