coef() method - any way to use gam.plot()?

jgellar commented 9 years ago

Hi Fabian,

I really liked what you did with the coef method for pfr, using gam.plot to extract the coefficient function. Do you think there would be any way to do this for pcox? It would be harder because a pcox object is not a gam object - maybe we can create a fake gam object based on the results of a pcox fit?

For the historical models (or any model that uses the limits option), we would have to then apply the limits function to the data frame to "filter out" unused coordinates.

Do you think it is it worth looking into doing this? It would make it easier to handle smooths of various number of variables.

fabian-s commented 9 years ago

Excellent idea, especially since plot.gaminvisibly returns the plotting data since the last mgcv- update. So something like

object <- fit2.1 # from pcox.testing.R
data <- data2  # from pcox.testing.R

fakegam <- list(smooth = object$pcox$smooth, Vp=object$var, model = data, 
  coefficients = object$coefficients)
class(fakegam) <- "gam"
## set <select> to nonsense number so no actual plotting is done:
coef <- plot(fakegam, select=length(object$pcox$smooth) + 1)
str(coef)
#List of 1
# $ :List of 11
#  ..$ x      : num [1:100] 0.00314 0.06655 0.12996 0.19337 0.25678 ...
#  ..$ scale  : logi TRUE
#  ..$ se     : num [1:100] 0.543 0.495 0.45 0.409 0.374 ...
#  ..$ raw    : num [1:500] 1.524 3.238 0.625 5.665 5.271 ...
#  ..$ xlab   : chr "x"
#  ..$ ylab   : chr "s(x,0)"
#  ..$ main   : NULL
#  ..$ se.mult: num 2
#  ..$ xlim   : num [1:2] 0.00314 6.28061
#  ..$ fit    : num [1:100, 1] -0.2031 -0.1409 -0.0787 -0.0165 0.0454 ...
#  ..$ plot.me: logi TRUE

works for simple terms and could work in general IF we find a way to access the data that was used to actually set up the bases for the fit for more complex terms as well.

First I thought that inserting

if(fitter == "coxph") newcall$model <- TRUE

into pcox so that it returns its model.frame which could then be used as the model-slot in fakegam would help (i.e., NAs already dropped, synthetic covariates like bla.smat etc already present, time-expanded data set for time-varying terms / covariates), but unfortunately, <coxph model>$model contains the basis function evaluations as matrix-valued covariates (coxph.penalty objects, more precisely), not the raw covariate values plot.gam needs (also, the model option for coxph() throws an error if there are time transforms -- look here^{(how the hell can a base package like survival suck this terribly...!)}

So I guess the challenge if you want to go this route is to provide fakegamwith a data.frame for the model-slot which contains

the raw values of all the covariates that occur in any of the term (and by ? ) slots of the list entries in object$pcox$smooth
without any NAs
in time-expanded form if time-varying effects / covariates are present.

As you said, the 2nd challenge is to then do suitable postprocessing based on the limits-arguments etc. with the object returned from plot.gam.

jgellar commented 9 years ago

That's nice that plot.gam now returns the plotting data - I always thought it would be nice to have direct access to it. And I agree RE: survival: it's pretty shocking that there are some things that are just left incomplete, especially regarding the tt terms. Just goes to show how much room there is in methods/software development for survival analysis!

Good to know you like the fakegam idea. But you're right that creating that data.frame that we need is going to be challenging. Especially for tt terms, because the bla.smat data isn't even created until it is within coxph, in the tt function.

Part of the solution, I think, is something we've already thought of: in order to extract the smooth object that is created, we give the tt function access to the pcox environment, with environment(tt.i)$env <- environment() (in pcox). This allows us to call env$smooth[[index]] <- pt$smooth with the tt function, so smooth goes back to pcox. The variables that we need are created within pcoxTerm, which is called by the tt function. So we just need to pass the env into pcoxTerm, and then add the variables to env$gamdata or something like that. Or we have pcoxTerm return the data, and assign it back to pcox within the tt function. This may actually be better.

For non-time-varying smooth terms, this could be done within the xt function. For regular parametric terms we just do it within pcox().

For functional predictors that have terms like bla.smat and bla.tmat, are these variables expected to be in matrix format or vector format?

One potential challenge is getting the variable names right - and making sure there is no overlap with different terms.

What do you think about this plan? I don't have time to really try to implement it today but maybe tomorrow.

fabian-s commented 9 years ago

For functional predictors that have terms like bla.smat and bla.tmat, are these variables expected to be in matrix format or vector format?

I don't think it matters for plot.gam. It will be important that the ranges of the data are correct.

What do you think about this plan?

:+1:

jgellar commented 9 years ago

I'm trying to get around to this today and maybe tomorrow.

I noticed that the coefficients.pfr() function you wrote doesn't directly call plot.gam(), it calls mgcv:::plot.mgcv.smooth(). This seems like it would be an easier option than creating an actual gam object - we can just call the function directly on all the smooth terms. Is there any disadvantage to this?

Also, are there any problems relying on unexported functions like this, e.g. CRAN doesn't like it? I assume not since this is what you did for pfr().

fabian-s commented 9 years ago

I don't see any disadvantages beyond having to write your own loop over <model>$smooth and using :::. CMD CHECK will issue a note about the latter and it's bad style as unexported functions may change their behavior from version to version, but in this case I think it could work as that plot method seems very stable.

We'll still need to figure out how to supply the correct data .....

jgellar commented 9 years ago

Check out my solution... it's working through all my preliminary tests. Some work still needed (specifically for the seWithMeans==TRUE scenario).

c1d383165332d6444ac21b5d654caa913e62633c

jgellar commented 9 years ago

Still to do for coef.pcox():

Handle the seWithMeans==TRUE scenario
Implement the limit option more generally - right now it only works for bivariate functions with columns s and t
Implement untransform option: this is for the case when s.transform and/or t.transform are used. As currently implemented, coef.pcox() will return the coefficient estimate on the "transformed" space (the space the basis is fit on). In order to get the coordinates back to the original space, we need the inverse of s.transform and t.transform. This seems hard.

Note - this problem would be avoided if we tie the transformations into the basis itself, using the tf basis from mgcvTrans. If we do that, then the data that is saved in smoothdata is the untransformed data, and the transformation is taken care of invisibly to pcox. I think that's the way to go - what do you think?

jgellar commented 9 years ago

Fixed item 1 from last comment (seWithMeans).

More To Do:

Figure out what to do with trivariate (and higher order) smooths - I don't think that plot.gam() even handles these, mostly because they're hard to visualize as a plot, but we need to figure out a way to extract the corresponding coefficients.

fabian-s commented 9 years ago

Running pcox.testing2.R, I get:

d> est2.1 <- coef(fit2.1)
 Error in ifelse(is.null(n), default_n(sdat.i[[smooth.i$term[[1]]]]), n) : 
  could not find function "default_n"

that function is not defined anywhere in code that's in the repo so none of the coef-calls work for me.

I know this slows you down, but if you want to give me a chance to keep contributing to this, we'll need to have working examples and tests and a code base that's on Github in its entirety.

jgellar commented 9 years ago

Fixed. The issue was I changed the name of the function to ndefault(), but forgot to change the functional call in the code, and I still had default_n() hanging out in my workspace.

I agree that we need a better testing system, though I'm not sure if it would have helped in this case if I still had the old function in my workspace. I will make a file of stable tests separate from pcox.testing2.R.

jgellar commented 9 years ago

All remaining items have been moved to #30, so closing this out

jgellar / pcox

coef() method - any way to use gam.plot()? #23