jgellar / pcox

Penalized Cox regression models
1 stars 0 forks source link

Terms with multiple smooth objects #16

Closed jgellar closed 9 years ago

jgellar commented 9 years ago

As you see from the recent updates, I'm finally getting a chance to get back into pcox. I'm working on changing functionality to accommodate methods now (in particular predict.pcox(), but it will be useful for other methods as well). One of the big changes that I made is that instead of just having tt functions for time-varying terms, I also am now creating xt functions (via create.xt.func) for "x-transformations" - these are variables that don't require a "time-transformation" (tt). These are a lot simpler than the tt functions because they can be executed within pcox instead of having to go through coxph.

One other quick question, I think I brought this up at some point in Munich but I wanted to revisit. When we create a smooth term via smoothCon(), it comes out as a list with a single element. Is there any reason that we couldn't just pull out the smooth from the list? I know that mgcv does this because occasionally there are terms that have more than one smooth associated with them. Will we ever have this? I can't remember exactly when this type of term would occur, or if it's relevant for us. It is just annoying to have to have to remember that extra indexing ([[1]]) every time I want to access the smooth object.

The only time that I can think of where multiple smooth objects would be needed is for "parameterized" interactions. I haven't implemented this functionality yet, but what I mean by that is that we make an assumption about $\beta(s,t)$, e.g., $\beta(s,t) = \beta_1(s) + t*\beta_2(s)$ (i.e., a linear interaction with time). Then a basis would be fit to both $\beta_1(s)$ and $\beta_2(s)$. However, for cases like this I think it would be desirable to only use a single smoothing parameter for the two terms. This would allow us to reduce the whole thing to a single smooth object with a block diagonal penalty matrix. We talked about this a little in person.

For what I described in the preceding paragraph, could this be accomplished by defining a "user-defined" smooth.construct() function? I've never done this, but I know it's possible (and I know that you have!). It would be awesome if we could do this in a flexible way where the user could decide how the bivariate coefficient function is parameterized. This seems tricky though.

fabian-s commented 9 years ago

When we create a smooth term via smoothCon(), it comes out as a list with a single element. Is there any reason that we couldn't just pull out the smooth from the list? I know that mgcv does this because occasionally there are terms that have more than one smooth associated with them. Will we ever have this?

you will get multiple smooth terms for by-variables that are factors. AFAICS that's the reason why smoothCon insists on returning lists. having the option to accomodate that type of effect (say, effects that differ between arms of a study) seems to me to be more valuable than not having to type [[1]] every now and then.

fabian-s commented 9 years ago

For what I described in the preceding paragraph, could this be accomplished by defining a "user-defined" smooth.construct() function? I've never done this, but I know it's possible (and I know that you have!). It would be awesome if we could do this in a flexible way where the user could decide how the bivariate coefficient function is parameterized. This seems tricky though.

jgellar commented 9 years ago

Did you just write this up? Awesome! I'll go through in more detail. I would also like to have some "keywords" for commonly used types of transformations. e.g., "linear", "quadratic", etc.

Can you tell me where it is documented on how to define a new smooth.construct() and Predict.matrix() functions? i.e., what the input/output has to look like for each function.

I wonder if I should do something like this for my variable-domain regression stuff. In fact, I probably could do the variable-domain regression with the same basis construction as this one.

fabian-s commented 9 years ago

?smooth.construct

jgellar commented 9 years ago

you will get multiple smooth terms for by-variables that are factors. AFAICS that's the reason why smoothCon insists on returning lists. having the option to accomodate that type of effect (say, effects that differ between arms of a study) seems to me to be more valuable than not having to type [[1]] every now and then.

Right, I forgot about factors, I knew there was a good reason.

I was just playing around with pffr, though, and I realized that mgcv::gam() actually gets rid of one of the levels of nesting that I am keeping around. For example, I fit the following model from the pffr help file: m1 <- pffr(Y ~ ff(X1, xind=s) + ff(X2, xind=s), yind=t, data=data1) the resulting object m1$smooth is a list of 3 elements, each of which is a mgcv.smooth object. The way pcox is currently set up, if I fit a model with 3 smooth terms, I would get a list with 3 elements, each of which would be a list with one element, which would be the mgcv.smooth object. It's this double-indexing that I am finding annoying/unnecessary.

The reason that I am keeping this extra level of indexing around is because I wanted the length of the smooth list to be exactly the same as the number of smooth terms in the pcox formula. If one of the terms of the pcox formula required two smooth objects (e.g., a smooth with a factor-by variable), I wanted these two stay "together". I haven't fit anything like this, hence the reason I wanted to get rid of the indexing. But I think some of my code relies on the fact that if I'm working with the third term of the pcox formula, I should look at the third element of smooth.

jgellar commented 9 years ago

I am finally getting around to finalizing the code you wrote for the parametric_ttrans basis. One thing that I noticed is that your code assumes that all the data seen by smooth.construct method are vectors, not matrices. So I modified the code so that it would also allow matrices, so we can have functional predictors. However, in the testing of this code I discovered that the matrices are vectorized before they ever reach the smooth.construct method. It looks like this is done in the ExtractData() function. Is it true that the smooth.construct method will only see vector data?

Also, a couple other comments/questions on this:

  1. I'm changing the basis to "pb" for "parametric bivariate". The reason being that the "interacting variable" is not necessarily time, and also because bs="parametric_ttrans" is long to write. If you can come up with a more appropriate 2-letter acronym I am fine with changing it. Maybe "pi" for "parametric interaction"? Though this could be confused with a certain mathematical constant.
  2. I've changed the way some of the basis options are entered. You required they all be entered within the xt list, I expanded it so they could be entered using the arguments of s(). This works for all of the options except bs (which is already in use in s()). I also added the ability to use "code words" for certain types of interactions ("linear", "quadratic", and "none").
  3. For Predict.matrix(): we need the univariate smooth object in order to do the predictions. There are two ways we can get it: we can re-create it within Predict.matrix(), or we can save it as part of the $xt element of the smooth object. The first way is more efficient with using space, and the second way is more efficient with computation time. Which way do you think is better?
  4. Think we should push to add this to refund? It might be useful with pffr() or fgam() models.

I am also going to create another basis for "domain transformation" (bs="dt"). The purpose of this is to allow the indices of the smooth to be transformed. There are two transformations that I've used before that I want to implement in this way. The first is for "lagging" the s indices (i.e., s* = s-t), which just changes the alignment of the triangle so the vertical stacking is based on the distance between an observation and s=t, as opposed to the distance from s=0. The second is for "domain-standardization", i.e. $s = s/t$ (use $s=0.5$ for t=0). This means that $s \in [0,1]$ for all t. In order to use thin-plate regression splines, we would have to transform $t=t/max(t)$ also, because the TPRS basis is isotropic so it assumes the same scale in both directions. A tensor product basis could also be used because the surface is now a rectangle.

The implementation of bs="dt" will, of course, allow the transformations to be entered as functions. Might be tricky if we want to allow this basis to be used more generally for smooths of any number of variables.... what would the transformation function arguments look like? I suppose the only thing to do is to have as many arguments to the function as variables of the data. e.g., for a smooth of 4 variables, the input would be something like

f1 <- function(a,b,c,d) a/b
f2 <- function(a,b,c,d) b
f3 <- function(a,b,c,d) c/max(c)
f4 <- function(a,b,c,d) log(d)
tf=list(f1,f2,f3,f4)

or something like that. Then our code would pass all 4 variables into each function to get the transformed variables.

Eventually I want to be able to fit a model with s(x, t, bs="dt", xt=list(bs="pb", tf="ds", xt=list(tf="linear", bs="ps"))) to first domain-standardize the indices, and then do a linear interaction with the resulting indices, using p-splines.

fabian-s commented 9 years ago

Nice! testing-code didn't really work, though. y u so sloppy? :wink:

However, in the testing of this code I discovered that the matrices are vectorized before they ever reach the smooth.construct method. It looks like this is done in the ExtractData() function. Is it true that the smooth.construct method will only see vector data?

yes, mgcv:::ExtractData calls mgcv:::get.var which by default turns matrix-variates into vectors.

I'm changing the basis to "pb" for "parametric bivariate". [...] Maybe "pi" for "parametric interaction"? Though this could be confused with a certain mathematical constant.

You confused yourself there, the code in /Testing didn't run because you mixed pb and pi. Fixed in https://github.com/jgellar/pcox/commit/3b1638ecb6af9401dce3b94e0a9cf75452bccbbc.

For Predict.matrix(): we need the univariate smooth object in order to do the predictions. There are two ways we can get it: we can re-create it within Predict.matrix(), or we can save it as part of the $xt element of the smooth object. The first way is more efficient with using space, and the second way is more efficient with computation time. Which way do you think is better?

Not sure I understand what you mean. Predict.matrix.pi.smooth seems fine to me in its present state. In any case, computation for setting up a smooth.spec.blafoo-object is negligible, while re-creating the design matrix may be expensive depending on the basis and the data. The latter is unavoidable if you want to get the fits for new data, though.

Think we should push to add this to refund? It might be useful with pffr() or fgam() models.

Maybe once it's more mature. Then again, if you use the mgcv-backend for inference there are less roundabout ways to achieve this kind of effect, e.g. by linking multiple smooth terms with by-variables containing the parametric transforms via their id-argument so they all take the same smoothing parameter(s).

fabian-s commented 9 years ago

I am also going to create another basis for "domain transformation" (bs="dt").

nice!

The implementation of bs="dt" will, of course, allow the transformations to be entered as functions. Might be tricky if we want to allow this basis to be used more generally for smooths of any number of variables.... what would the transformation function arguments look like? I suppose the only thing to do is to have as many arguments to the function as variables of the data. e.g., for a smooth of 4 variables, the input would be something like [...]

Sweet jesus. You're right, I guess. What's the use case for that kind of complexity, though?

Eventually I want to be able to fit a model with s(x, t, bs="dt", xt=list(bs="pb", tf="ds", xt=list(tf="linear", bs="ps"))) to first domain-standardize the indices, and then do a linear interaction with the resulting indices, using p-splines.

I can see why that would be a desirable option. There's a big danger here, though, that you/we are adding lots of complex features onto a not-really-all-that-stable codebase. What I would suggest is that you/we update the to-do list before adding more complexity, do a proper cleanup of the package code & documentation (i.e., we should be able to run devtools::check without NOTES and WARNINGS afterwards, and the documentation should describe the actually implemented features, which is presently not the case, I think) and write a collection of reliable, finegrained tests at least for the basic functionality (all variants of bf, hf, cf, nonlinear scalar effects etc) and for foreseeable real world complications (missing values, massive drop out, ...). Once that is in place it's much easier to ad additional complexity without breaking any of the more basic stuff in the process (just do a regression test before each substantial commit to check that everything still works as before).

Another point to consider, depending also on how much more time you can spend on this project before you start your new job, is what's more important: being able to fit every conceivable kind of complex effect or having easy-to-use and well-designed plot, summary, residuals, fitted, etc methods for pcox-objects. I'd argue for the latter.

jgellar commented 9 years ago

You're right, of course, that the package needs cleaning up, and I guess I'm guilty of trying to add too much complexity right away. It would definitely have been better to get some basic stuff working fully before trying to add in all these features. This is a big "lesson" that I've learned from this whole project! Get the basics working first, then add functionality later.

That being said, the main reason I want to implement all this domain-transformation stuff is because I want to fit those models now (as in, before I have to turn in my thesis in a few weeks). I found that, in particular, standardizing and parameterizing the domain resulted in better-fitting models for the variable-domain model (with death as a binary outcome), and I suspect the same thing will happen here (at least I hope so). The biggest gain was for the "domain standardized" model. In the variable-domain paper, I did this by a "change of variable" substitution, defining $u = s/T_i$ as the new domain for each function, and then resampling all the functions on an equally-spaced grid over $u \in [0,1]$. Now that I'm a little better at this stuff, I realize that a better way to do this is just to transform the basis indices. FYI, here is the section that I wrote up for the Historical Cox paper (which you're a co-author on, I'll send it out to you when I'm finished writing). I copied and pasted this from the .tex code, so I'm just entering it as a code block here to keep all the formatting:

Recall that we approximate $\beta(s,t)$ with $\sum_k b_k \phi_k(s,t)$. The lagged-time basis sets
$\phi_k(s,t) = \phi_k^*(s^*, t)$, with $s^* \equiv s-t$, and the basis functions $\{\phi_k^*(\cdot,\cdot)\}$ are used to fit
the model. Note that $s^*$ is the negative time between any observed $X_i(s)$ and the observation at time $t$. By using
this re-parameterization, the $(s,t)$ coordinates along the right edge (hypotenuse)
of the triangular surface in \ref{fig:weighting} are stacked vertically from the perspective of the basis,
bringing them closer together. This causes the smoothness across different levels of $t$ to be based on the amount of time
until time $t$, as opposed to the amount of time since time 0. The practical effect is that it slightly increases
the amount of smoothness along this edge of $\beta(s,t)$, while decreasing it along the left (vertical) edge.

For the domain-standardized basis, we set $\phi_k(s,t) = \phi_k^*(s^*, t^*)$, with $t^* = t/\max_i(Y_i)$ and
$s^* = s/t$ for $t\neq0$ and $s^* = 0.5$ for $t=0$. Note that the coordinates $(s^*, t^*)$ will fall on the unit
square $[0,1]\times[0,1]$ for the observed data. Using this basis causes the smoothness to be
based on the proportion $s/t$, instead of on $s$, over different levels of $t$. 
The $t$ coordinate is scaled down by $\max_i(Y_i)$ so that $t^*\in[0,1]$, because isotropic
bases such as thin-plate regression splines assume the scale of each coordinate of the basis is the same.
Since the coordinates $(s^*, t^*)$ fall on the unit square, we could alternately use a tensor product basis
for $\{\phi_k^*(\cdot,\cdot)\}$.
The practical effect of this re-parameterization is that a greater amount of smoothness is assumed at
high levels of $t$ than at low levels. It may be desirable to allow the resulting weight functions $\beta(s, t_0)$
to be more variable for low $t_0$ than for high $t_0$.

The reason I showed the example for a transformation of a smooth of 4 variables was because I just figured if I was going to implement this now for a bivariate smooth, I might as well implement it to be flexible enough to handle an arbitrary number of variables. It's not terribly more difficult to program it that way. But no, I don't have any need for that kind of complexity now (and maybe nobody ever will).

You confused yourself there, the code in /Testing didn't run because you mixed pb and pi. Fixed in 3b1638e.

I didn't exactly confuse myself, but I changed my mind after I wrote the last comment. I decided to make the basis work for an arbitrary number of variables, so "pb" was no longer appropriate, and I apparently didn't make all the necessary changes to the code/documentation. Thanks for the cleanup.

Not sure I understand what you mean. Predict.matrix.pi.smooth seems fine to me in its present state. In any case, computation for setting up a smooth.spec.blafoo-object is negligible, while re-creating the design matrix may be expensive depending on the basis and the data. The latter is unavoidable if you want to get the fits for new data, though.

Yeah, I decided to go with the easier route, which was to just save the smooth object for the "reduced" smooth in the pi.smooth object. It basically just means that the pi.smooth object is a little bigger that it needs to be. The alternative would have been to save only what is necessary to "re-create" this object within Predict.matrix.pi.smooth().

jgellar commented 9 years ago

Going back to the "pi" basis, which we discussed above. Recall that for a smooth with a parametric interaction, we are using the same smoothing parameter to penalized each of the reduced-dimension smooths. In other words, if we have $f(x,t) = f_1(x) + t*f_2(x)$, we only use one smoothing parameter for both $f_1$ and $f_2$, by defining the smoothing matrix to be block diagonal with identical blocks.

I am now giving the option to relax this, and to use a different smoothing parameter for each $f_k(x)$. I do this by setting 'object$S' to be a list with multiple matrices: the first is the block diagonal matrix with blocks (S[[1]], 0), and the second has blocks (0, S[[1]]). For a general number of transformation functions, here is the code to do this:

    object$S <- lapply(1:n_transform, function(k) {
      diag(as.numeric((1:n_transform) == k)) %x% sm$S[[1]]
    })

Just confirming, this should work, right? It seems like a convenient way to fit multiple smooths (which more generally don't even have to be related to one another) in the same smooth object.

fabian-s commented 9 years ago

seems fine, yet inefficient. linking smooths via "id" would prob. be the preferred mechanism for this. but then you couldn't use this type of basis constructor to achieve your goal.....

On Wed, Apr 22, 2015 at 11:38 PM, jgellar notifications@github.com wrote:

Going back to the "pi" basis, which we discussed above. Recall that for a smooth with a parametric interaction, we are using the same smoothing parameter to penalized each of the reduced-dimension smooths. In other words, if we have $f(x,t) = f_1(x) + t*f_2(x)$, we only use one smoothing parameter for both $f_1$ and $f_2$, by defining the smoothing matrix to be block diagonal with identical blocks.

I am now giving the option to relax this, and to use a different smoothing parameter for each $f_k(x)$. I do this by setting 'object$S' to be a list with multiple matrices: the first is the block diagonal matrix with blocks (S[[1]], 0), and the second has blocks (0, S[[1]]). For a general number of transformation functions, here is the code to do this:

object$S <- lapply(1:n_transform, function(k) {
  diag(as.numeric((1:n_transform) == k)) %x% sm$S[[1]]
})

Just confirming, this should work, right? It seems like a convenient way to fit multiple smooths (which more generally don't even have to be related to one another) in the same smooth object.

— Reply to this email directly or view it on GitHub https://github.com/jgellar/pcox/issues/16#issuecomment-95345738.

jgellar commented 9 years ago

Linking smooths with id would give them the same smoothing parameter, correct? For what I described in the previous message, I was talking about using different smoothing parameters for all of the "reduced" bases. You wrote some code before to handle the case when we want to use the same smoothing parameter for all of the "reduced" bases - you just have a single block-diagonal smoothing matrix where all blocks are the same.

When you say this would be "inefficient", do you mean it would fit slower, because of the unnecessarily large matrices involved? I still think this is the only way to do it and have it all contained in a single basis object, which is what we need for user transparency. Maybe we could gain some speed if we define things as sparse matrices using the Matrix package? But I don't know if mgcv performs any operations on these matrices that aren't supported for sparse matrices.

fabian-s commented 9 years ago

Linking smooths with id would give them the same smoothing parameter, correct?

yes

For what I described in the previous message, I was talking about using different smoothing parameters for all of the "reduced" bases.

sorry. i misread.

When you say this would be "inefficient", do you mean it would fit slower, because of the unnecessarily large matrices involved? I still think this is the only way to do it and have it all contained in a single basis object, which is what we need for user transparency.

yes, you're right, I'm stupid,

Maybe we could gain some speed if we define things as sparse matrices using the Matrix package? But I don't know if mgcv performs any operations on these matrices that aren't supported for sparse matrices.

I don't think this is likely to work.

jgellar commented 9 years ago

This has been implemented