jgellar / pcox

Penalized Cox regression models
1 stars 0 forks source link

makepredictcall #39

Closed jgellar closed 7 years ago

jgellar commented 8 years ago

I’m trying to figure out how we could make makepredictcall() work for pcox. Here is what I’ve learned about how it works:

  1. makepredictcall() is called within model.frame.default. The first time model.frame.default is called (i.e., when the model is fit), the predvars attribute of the formula will be NULL. In this case, the columns of the model frame (i.e., the “variables”) are created by evaluating the “terms” calls.
  2. After the variables are evaluated, makepredictcall() is called on those variables. The result is a new call, which is stored as the predvars attribute of the formula.
  3. When predict() is called on the model, model.frame.default is called again. This time predvars is not null. So instead of evaluating the “terms” call, it evaluates the predvars call.

Now for pcox

My initial assessment is that this will be a lot of work, and isn’t worth doing now because (a) we are pretty close to release as is, and (b) Therneau would need to implement some changes to the tt system in coxph before this could even work. Assuming he follows through and does that though, it might be worth it in the end for us to do this because it will make all the methods magically work, perhaps more reliably/robustly to changes in the coxph code.

Let’s first talk about the non time-varying terms, e.g. f(age), because these are the only ones we can even consider until (b) above is taken care of. What we want to happen is, the first time model.frame.default is called, the “terms” call has to return a coxph.penalty object. We also need to set predvars - this should be a call that ends up calling PredictMat() on the smooth object. However, PredictMat needs a data argument when it is executed, and model.frame.default won’t know to send the data in as data. So we need to create a wrapper function that sets up the data, and calls PredictMat with that argument. predvars will be a call to this function. Luckily, we have a system to do this already - these are our xt functions.

Right now, all of the processing of these terms occurs within pcox, before coxph is ever called. We convert the input data into a coxph.penalty object, and this is what coxph sees. So this won’t work with makepredictcall - the smooth object never even makes it into coxph, so predvars wouldn’t be able to be set up right.

What we would need to do is have all the processing occur within coxph. I think this means that the coxph call that we create in pcox will have to have terms in it that are function calls essentially do what our current xt functions do. e.g., if we call this function xt(), then the coxph call is something like

coxph(Surv(time,event) ~ xt(age, limits=NULL, linear=FALSE, etc.), data=dat)

Currently, we create the xt functions dynamically, so we don’t actually have a function xt() in the package - we use create.xt.func(). This was done so the only argument of the xt function we create is x, so that it mimics what we do for tt functions (which are restricted to only have arguments x and t). This would have to change for the new implementation. Instead of creating the xt function dynamically, in pcox we set up a call to our new xt(), which does exactly what the dynamically-created xt functions did before.

So the first time model.frame.default is called, that xt term will be executed and return a coxph.penalty term. That term will have to be of class xt (inheriting from coxph.penalty), so we can write a makepredictcall.xt method. But what happens to the smooth object, which we still need to keep track of? I guess we still need to handle this by passing xt() an environment (and an index) where the smooth object goes? If we do this, then within xt() we can just search the environment for smooth, and if it is already present, call PredictMat instead of smoothCon - similar to what we are doing now. In this case, I don’t think we even have to write a makepredictcall() method, right?

Let me know what you think (especially about this last part re: the smooth object).

fabian-s commented 8 years ago

Thanks for figuring this out, Jonathan, makes a lot of sense to me.

But what happens to the smooth object, which we still need to keep track of? I guess we still need to handle this by passing xt() an environment (and an index) where the smooth object goes?

That seems unavoidable as long as xt() only takes an x argument and nothing else.

If we do this, then within xt() we can just search the environment for smooth, and if it is already present, call PredictMat instead of smoothCon - similar to what we are doing now. In this case, I don’t think we even have to write a makepredictcall() method, right?

Seems so.

It's great that all of this stuff seems to work now, but I'm also starting to think it might have been a better design choice to do the entire formula parsing / model matrix setup in our own code and basically only use coxph.fit instead of calling coxph and handing over all these intricate closures to make sure the terms get evaluated correctly. That would have probably meshed well with using makepredictcall as well. Too late now.... I really should have seen this coming, after having lots of similar issues when developing pffr by wrapping around gam instead of gam.fit :frowning:

jgellar commented 8 years ago

That seems unavoidable as long as xt() only takes an x argument and nothing else.

In the new plan, I think xt would have to take other arguments (see the example above) - basically all the arguments that create.xt.func takes, plus x. btw, I'm not sold on xt() being the best name for this function - I named it this for our "internal" tracking of the functions that pre-process the covariates (without involving t), because it was analogous to what the tt function did. But now with this function becoming part of the actual coxph formula, we might want to rename it. Even though the user never has to touch this function.

I tend to agree with you regarding coxph.fit vs. coxph. Then again, part of the advantage of wrapping around coxph is that coxph supports a lot of stuff I don't want to have to worry about, e.g. strata/clustering, all sorts of different ways of calculating SE's and handling ties, etc. Plus in theory we should be able to use the coxph methods if Therneau implements this tt thing.

On that note - what do you think the next steps were if we were going to actively work with him? Should we try to write something to get the predictions working with tt and "submit" it to him? He briefly mentioned authorship - which would be pretty cool. Maybe one of us should write to him and clarify future steps?

FYI, I'll be out on vacation all next week.

fabian-s commented 8 years ago

Sorry, yes, of course.

On that note - what do you think the next steps were if we were going to actively work with him? Should we try to write something to get the predictions working with tt and "submit" it to him? He briefly mentioned authorship - which would be pretty cool. Maybe one of us should write to him and clarify future steps?

That's a good idea. Let's discuss this further after we get back from vacation -- mine starts tomorrow, I'll be back September 2.

jgellar commented 8 years ago

In the meantime, should I continue to push forward with what I was doing as if we never had the conversation with Therneau? The full "To Do" list is in #30, but I don't think all of this needs to be done for the initial release. Right now, the main things remaining that are necessary are to get something working to extract the baseline hazard function (survfit.pcox), run some more checks on the predict/cross-validation code, and clean up some documentation.

Note that the first two of those things are difficult only because of these tt/makepredictcall issues. If we got this working within coxph, then would probably be able to use predict.coxph and survfit.coxph directly. I just think it would take more time before Therneau is able to implement these changes.

fabian-s commented 8 years ago

I can't tell how hard it would be to write a makepredictcall.tt method that works for us, or whether that's all that's missing, so I don't know what's the best strategy here.

Aren't there other strategies as well, though:

Isn't it true though that once we have a working predict-method, we could use that for generating the quantities that survfit needs?

Or, on a lower level, if we had a working model.frame.pcox (or rather model.matrix.pcox...) method, that that would make both predict.coxph and survfit.coxph work for us?

Either would of course have to implement the functionality of a makepredictcall.tt, but we can tailor it more to our own implementation this way since we can do the computation based on the pcox-object instead of inside a call to a coxph-method.

I just think it would take more time before Therneau is able to implement these changes.

I agree.

jgellar commented 8 years ago

I can't tell how hard it would be to write a makepredictcall.tt method that works for us, or whether that's all that's missing, so I don't know what's the best strategy here.

Right now the tt function is executed after model.frame is called - meaning that the result of the tt call (i.e., an object that we can say is class "tt"), won't ever get to makepredictcall.tt as currently constructed. There would need to be changes to both coxph and predict.coxph to make this work. Therneau clearly had interest in doing this though.

Isn't it true though that once we have a working predict-method, we could use that for generating the quantities that survfit needs?

We have a working predict method, and yes I'm using it in survfit.pcox. The actual survival calculations are handled by survfitcoxph.fit - it's just a bit annoying to set up the input to this function, but I'm working through it.

Or, on a lower level, if we had a working model.frame.pcox (or rather model.matrix.pcox...) method, that that would make both predict.coxph and survfit.coxph work for us?

If/when we go to the model.frame system, I think we should just do it for coxph. That way we don't have to figure the same thing out twice. I'm pretty sure once coxph has this implemented, we can just use it. Also, doesn't this only work if we are calling model.frame within pcox? Since it's being called within coxph, I'm not sure if we'll have all the control we need.

fabian-s commented 8 years ago

We have a working predict method, and yes I'm using it in survfit.pcox.

Sorry, I got confused with that

if (missing(newdata)) {
    newdata <- something
  }

placeholder in survfit.pcox that gives errors. Of course we have a working predict, I'm stupid.

Also, doesn't this only work if we are calling model.frame within pcox?

I don't think so. As long as the object that the model.frame-generic is called on has "pcox" before "coxph"in its class-attribute, method dispatch will first try to find&use the model.frame.pcox method, regardless from where the generic is called.

jgellar commented 8 years ago

I'm stupid.

You said it, not me.

I don't think so. As long as the object that the model.frame-generic is called on has "pcox" before "coxph"in its class-attribute, method dispatch will first try to find the model.frame.pcox method, regardless from where the generic is called.

I think I understand - you are refering to when model.frame is called from within predict.coxph (or predict.pcox) - which doesn't have to be the same model.frame method that is called from within the coxph call. I guess that could work. I'll admit I am a bit new to the exact use of model.frame/model.matrix, and this may end up being more difficult than the plan of modifying survival functions directly.

fabian-s commented 8 years ago

this may end up being more difficult than the plan of modifying survival functions directly.

not sure. i'm off, gotta pack, enjoy your vacation!

fabian-s commented 7 years ago

Jonathan, is this issue still current? Do we need this for the CRAN release?

jgellar commented 7 years ago

No, I don't think we need this for the CRAN release. In particular because IIRC we would have to make some recommendations to Therneau so he can make his changes to the survival package, and we have no control over how long that would take.

fabian-s commented 7 years ago

So I'm closing this because it doesn't seem like something we'll work on for the foreseeable future.....