Closed jgellar closed 9 years ago
Excellent idea, especially since plot.gam
invisibly returns the plotting data since the last mgcv
- update. So something like
object <- fit2.1 # from pcox.testing.R
data <- data2 # from pcox.testing.R
fakegam <- list(smooth = object$pcox$smooth, Vp=object$var, model = data,
coefficients = object$coefficients)
class(fakegam) <- "gam"
## set <select> to nonsense number so no actual plotting is done:
coef <- plot(fakegam, select=length(object$pcox$smooth) + 1)
str(coef)
#List of 1
# $ :List of 11
# ..$ x : num [1:100] 0.00314 0.06655 0.12996 0.19337 0.25678 ...
# ..$ scale : logi TRUE
# ..$ se : num [1:100] 0.543 0.495 0.45 0.409 0.374 ...
# ..$ raw : num [1:500] 1.524 3.238 0.625 5.665 5.271 ...
# ..$ xlab : chr "x"
# ..$ ylab : chr "s(x,0)"
# ..$ main : NULL
# ..$ se.mult: num 2
# ..$ xlim : num [1:2] 0.00314 6.28061
# ..$ fit : num [1:100, 1] -0.2031 -0.1409 -0.0787 -0.0165 0.0454 ...
# ..$ plot.me: logi TRUE
works for simple terms and could work in general IF we find a way to access the data that was used to actually set up the bases for the fit for more complex terms as well.
First I thought that inserting
if(fitter == "coxph") newcall$model <- TRUE
into pcox
so that it returns its model.frame
which could then be used as the model
-slot in fakegam
would help (i.e., NAs already dropped, synthetic covariates like bla.smat
etc already present, time-expanded data set for time-varying terms / covariates), but unfortunately, <coxph model>$model
contains the basis function evaluations as matrix-valued covariates (coxph.penalty
objects, more precisely), not the raw covariate values plot.gam
needs (also, the model
option for coxph()
throws an error if there are time transforms -- look here(how the hell can a base package like survival
suck this terribly...!)
So I guess the challenge if you want to go this route is to provide fakegam
with a data.frame
for the model
-slot which contains
term
(and by
? ) slots of the list entries in object$pcox$smooth
As you said, the 2nd challenge is to then do suitable postprocessing based on the limits
-arguments etc. with the object returned from plot.gam
.
That's nice that plot.gam
now returns the plotting data - I always thought it would be nice to have direct access to it. And I agree RE: survival
: it's pretty shocking that there are some things that are just left incomplete, especially regarding the tt
terms. Just goes to show how much room there is in methods/software development for survival analysis!
Good to know you like the fakegam
idea. But you're right that creating that data.frame
that we need is going to be challenging. Especially for tt
terms, because the bla.smat
data isn't even created until it is within coxph
, in the tt
function.
Part of the solution, I think, is something we've already thought of: in order to extract the smooth
object that is created, we give the tt
function access to the pcox
environment, with environment(tt.i)$env <- environment()
(in pcox
). This allows us to call env$smooth[[index]] <- pt$smooth
with the tt
function, so smooth
goes back to pcox
. The variables that we need are created within pcoxTerm
, which is called by the tt
function. So we just need to pass the env
into pcoxTerm
, and then add the variables to env$gamdata
or something like that. Or we have pcoxTerm
return the data, and assign it back to pcox
within the tt
function. This may actually be better.
For non-time-varying smooth terms, this could be done within the xt
function. For regular parametric terms we just do it within pcox()
.
For functional predictors that have terms like bla.smat
and bla.tmat
, are these variables expected to be in matrix format or vector format?
One potential challenge is getting the variable names right - and making sure there is no overlap with different terms.
What do you think about this plan? I don't have time to really try to implement it today but maybe tomorrow.
For functional predictors that have terms like bla.smat and bla.tmat, are these variables expected to be in matrix format or vector format?
I don't think it matters for plot.gam
. It will be important that the ranges of the data are correct.
What do you think about this plan?
:+1:
I'm trying to get around to this today and maybe tomorrow.
I noticed that the coefficients.pfr()
function you wrote doesn't directly call plot.gam()
, it calls mgcv:::plot.mgcv.smooth()
. This seems like it would be an easier option than creating an actual gam
object - we can just call the function directly on all the smooth terms. Is there any disadvantage to this?
Also, are there any problems relying on unexported functions like this, e.g. CRAN doesn't like it? I assume not since this is what you did for pfr()
.
I don't see any disadvantages beyond having to write your own loop over <model>$smooth
and using :::
. CMD CHECK will issue a note about the latter and it's bad style as unexported functions may change their behavior from version to version, but in this case I think it could work as that plot method seems very stable.
We'll still need to figure out how to supply the correct data .....
Check out my solution... it's working through all my preliminary tests. Some work still needed (specifically for the seWithMeans==TRUE
scenario).
c1d383165332d6444ac21b5d654caa913e62633c
Still to do for coef.pcox()
:
seWithMeans==TRUE
scenariolimit
option more generally - right now it only works for bivariate functions with columns s
and t
untransform
option: this is for the case when s.transform
and/or t.transform
are used. As currently implemented, coef.pcox()
will return the coefficient estimate on the "transformed" space (the space the basis is fit on). In order to get the coordinates back to the original space, we need the inverse of s.transform
and t.transform
. This seems hard.Note - this problem would be avoided if we tie the transformations into the basis itself, using the tf
basis from mgcvTrans
. If we do that, then the data that is saved in smoothdata
is the untransformed data, and the transformation is taken care of invisibly to pcox
. I think that's the way to go - what do you think?
Fixed item 1 from last comment (seWithMeans
).
More To Do:
plot.gam()
even handles these, mostly because they're hard to visualize as a plot, but we need to figure out a way to extract the corresponding coefficients.Running pcox.testing2.R
, I get:
d> est2.1 <- coef(fit2.1)
Error in ifelse(is.null(n), default_n(sdat.i[[smooth.i$term[[1]]]]), n) :
could not find function "default_n"
that function is not defined anywhere in code that's in the repo so none of the coef
-calls work for me.
I know this slows you down, but if you want to give me a chance to keep contributing to this, we'll need to have working examples and tests and a code base that's on Github in its entirety.
Fixed. The issue was I changed the name of the function to ndefault()
, but forgot to change the functional call in the code, and I still had default_n()
hanging out in my workspace.
I agree that we need a better testing system, though I'm not sure if it would have helped in this case if I still had the old function in my workspace. I will make a file of stable tests separate from pcox.testing2.R
.
All remaining items have been moved to #30, so closing this out
Hi Fabian,
I really liked what you did with the
coef
method forpfr
, usinggam.plot
to extract the coefficient function. Do you think there would be any way to do this forpcox
? It would be harder because apcox
object is not agam
object - maybe we can create a fakegam
object based on the results of apcox
fit?For the historical models (or any model that uses the
limits
option), we would have to then apply thelimits
function to the data frame to "filter out" unused coordinates.Do you think it is it worth looking into doing this? It would make it easier to handle smooths of various number of variables.