jgellar / pcox

Penalized Cox regression models
1 stars 0 forks source link

Coordinate transformations #21

Closed jgellar closed 9 years ago

jgellar commented 9 years ago

I think I mentioned at some point that I am implementing "coordinate transformations" for the coordinates of the smooths. This is currently done through the s.transform and t.transform arguments of hf(), p(), etc. For example, to implement "domain standardization", I use the following transformations:

s.transform <- function(s,t) {
      smin <- min(s, na.rm=TRUE)
      ifelse(t==smin, 0.5, (s-smin)/(t-smin))
}
t.transform <- function(t) (t-min(t, na.rm=T))/(max(t, na.rm=T)-min(t, na.rm=T))

The model fits just fine when I use these transformations. However, problems occur when I try to extract the coefficient function from this model. In order to obtain the appropriate transformed coordinates to pass on to PredictMat(), I need to call s.transform() and t.transform() on the prediction data. But notice that both of these transformation functions above have max() or min() in them. When this is called on the prediction data, it will not necessarily give the same max and min as what was used to generate the coordinates that were used to fit the model. This screws up the mapping from the original coordinates to the transformed ones, and the estimates don't come out right.

Do you have any suggestions on how to get around this issue? I'm sure I could rig something up, but everything I'm thinking of right now would be pretty messy.

jgellar commented 9 years ago

Here is my best attempt at solving the problem, it is limited but it might work for now:

  1. At the start of create.tt.func() and create.xt.func(), initialize tmax=tmin=smax=smin <- NULL.
  2. Within the tt or xt function, have something like
if (is.null(tmax)) {
    assign("tmax", max(tmat, na.rm=T), envir=environment(s.function))
    assign("tmax", max(tmat, na.rm=T), envir=environment(t.function))
}

Repeat for tmin, smax, and smin.

  1. The s.transform and t.transform functions now have access to those four variables. And those should stay with the tt or xt function, right?

I'll play around with this to see if it works. I don't like that it requires that we "pre-specify" the functions that can be used (max and min of s and t). But I guess these are the ones that would be the most common problem ones. Well, I guess a "quantile" function would as well. To really make it more flexible, I guess we would have to save the original s and t data to the environment of the s.function and t.function?

jgellar commented 9 years ago

The above technique seems to be working. Instead of adding smax, smin, tmax, and tmin, I am assigning s0 and t0, which are the original smat and tmat used to fit the model (with unused coordinates masked to NA). This way, the s.transform and t.transform functions can have lines such as min(t0, na.rm=T) and quantile(s0, .5, na.rm=T), etc. Not the most user-friendly implementation, but the best I can think of.

Do you have any better ideas? If not, I will close out this issue.

fabian-s commented 9 years ago

That seems pretty elegant to me, actually. It just needs to be documented properly so people know how to write the transform functions.

I'd sugest to close it once it's documented and we have working tests for this.