Closed mattwarkentin closed 2 years ago
Hi @mattwarkentin , thanks for the PR!
Which changes in parsnip are you referring to? Let me know when you want feedback on the PR! (Now?)
I am somewhat new to contributing to parsnip
-adjacent packages, but I thought that an update to the parsnip
documentation would be important for showing the flexsurvspline
engine and tunable arguments in the documentation (i.e., num_knots
, survival_link
).
Basically flexsurvspline
versions of:
But maybe I'm wrong.
Sure, happy to receive feedback on the PR any time!
Sorry for dropping the ball on this. I was moving across the country for a new job and just totally got sidetracked. Want me to get this back on track, @hfrick?
No worries, there is no rush! If you want, that'd be great! The main things are the unit tests and the documentation over in parsnip.
For me:
vignette/articles/examples.Rmd
README.Rmd
Okay @hfrick, we are getting close. The one thing I'm stumbling on is that there surely needs to be somewhere where we connect the fact that OUR parameter is called num_knots
, while flexsurv::flexsurvspline()
uses k
.
I thought maybe I handled this when I made the PR to dials
, but it doesn't look like it. Where do we make that mapping?
This seems like something handled by parsnip::set_model_arg()
, but decided to remove that above...how do we do this??
Awesome!
If the argument you want to make tunable is a main argument, ie an argument directly to survival_reg()
, you use set_model_arg()
like you initially tried. Since k
is an engine-specific argument, it's more subtle. This happens in the tunable()
method for survival_reg()
(which sits in parsnip) - you already modified the right things, the last missing bit was just to use the name of the arg as it is used in flexsurv. Then you have your link to dials and the tidymodels machinery can work. The changes you made in dials were to create the parameter object (tidymodels uses that to get possible parameter values for tuning). With the change in parsnip you link that to the engine argument and the machinery is connected.
Thanks for adding the tests! Could you update them so that they make use of the spline functionality? Then we know that this aspect also works! For survival probability and hazard, we don't need to test against rms if we test against flexsurv.
Re predictions of the linear predictor: I noticed in the test example that flexsurv returns negative values and censored then tries to log those, resulting in NaN
. For flexsurvreg()
, predictions by flexsurv are exp(x * beta)
, which is why censored logs them before returning them as predictions of type linear_pred
(so that it's x * beta
). How is that with flexsurvspline()
? What exactly does it return, does it make sense to log here? See reprex below.
The main
branch has changed a lot since this branch got started but we are not seeing any merge conflicts - maybbe that's because it's a draft PR so just as a heads up that we might still encounter some.
library(flexsurv)
#> Loading required package: survival
spline_fit <- flexsurvspline(
Surv(time, status) ~ age + sex,
data = lung
)
predict(spline_fit, lung[1:5,], type = "linear")
#> # A tibble: 5 × 2
#> .time .pred_link
#> <dbl> <dbl>
#> 1 0 -7.63
#> 2 0 -7.72
#> 3 0 -7.92
#> 4 0 -7.90
#> 5 0 -7.85
non_spline_fit <- flexsurvreg(
Surv(time, status) ~ age + sex,
data = lung, dist = "weibull"
)
predict(non_spline_fit, lung[1:5,], type = "linear")
#> # A tibble: 5 × 2
#> .time .pred_link
#> <dbl> <dbl>
#> 1 0 314.
#> 2 0 338.
#> 3 0 392.
#> 4 0 387.
#> 5 0 373.
Created on 2022-10-28 with reprex v2.0.2
Could you update them so that they make use of the spline functionality? Then we know that this aspect also works! For survival probability and hazard, we don't need to test against rms if we test against flexsurv.
Tests are updated and removed the rms comparison.
How is that with flexsurvspline()?
A flexsurvspline
model is just a flexsurvreg
model with a different distribution so the predictions are made the same way with predict.flexsurvreg()
which ultimately relies on the machinery of summary.flexsurvreg()
. So unless something is wrong, they should be returning comparable statistics, just on a different scale. Am I misunderstanding?
For flexsurvreg(), predictions by flexsurv are exp(x beta), which is why censored logs them before returning them as predictions of type linear_pred (so that it's x beta).
Are we sure this is the case? Maybe we should loop in @chjackson and get his input.
Yes in flexsurv, predict(...,type="linear")
returns the "fitted values of the location parameter" - understood as being on the natural scale, not logged. Perhaps I should disambiguate this doc.
Thanks @chjackson! So just to be super clear: this applies to the predict methods for both the models from flexsurvreg()
and flexsurvspline()
?
Where I'm coming from with this question: For the existing engine, which uses flexsurvreg()
, censored logs the prediction values returned from flexsurv because it standardizes across engines and most other engines return on that scale. So now I'm asking if censored needs to do the same for this new flexsurvspline
engine?
Both flexsurvspline
and flexsurvreg
return objects of class "flexsurvreg"
, so the same predict method will be used.
Is there a confusion here since flexsurv models can be based on different parametric survival distributions? The meaning of "location parameter" depends on the distribution. Sometimes this parameter is defined to be positive (such as rate and scale parameters in the exponential, Weibull or gamma), and sometimes it is unrestricted (such as meanlog
in the log-normal, and the gamma0
parameter in flexsurvspline
). So the "natural scale" of the parameter could either be positive or unrestricted. The "transformed scale" used for estimation is the log scale for positive parameters, and the same as the natural scale for unrestricted parameters.
I guess flexsurv conflicts here with user expectations that predict(...,type="linear")
methods should always return quantities on the unrestricted scale?
FYI if this is helpful, for a parameter named "parname"
in a fitted flexsurv model x
, the function x$dlist$transforms[["parname"]]()
transforms a parameter value from its natural scale to the unrestricted scale. This is nearly always either log or the identity transformation, depending on the model. x$dlist$location
contains the name of the location parameter.
@hfrick You may wish to use the functions in x$dlist$inv.transforms
to get location parameters on the unrestricted scale. As Chris mentioned, it is often either identity()
or log()
.
No it's x$dlist$transforms
to go from the natural scale to the unrestricted scale, and x$dlist$inv.transforms
to go the other way.
Oops, my mistake. Please ignore my previous comment.
@mattwarkentin @chjackson Thank you both; that's really helpful to know! I've now changed the transformation for the link/linear predictor in this commit https://github.com/tidymodels/censored/pull/213/commits/31fd6b9bad00317fc5077d30a0bbd66382e4730a - could you confirm that this is correct now?
Other than that, I think this PR is ready!
Can't see anything wrong with the transformation procedure there
LGTM!
Great, thanks so much both!
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
Hi @hfrick,
This PR is a start for addressing https://github.com/tidymodels/censored/issues/115.
I have marked the PR as a draft because I think some changes to
{parsnip}
are required before this PR should be merged into{censored}
.I look forward to your feedback and hopefully getting this integrated into censored eventually.