mgcv penalty matrix: S.scale

jgellar commented 9 years ago

Hi Fabian,

In pterm.R, I need to take an existing "smooth" term (evaluated by smoothCon()), extract the basis matrix and penalty matrix, and convert them into a coxph.penalty object. When I refer to the "penalty matrix", I am talking about the matrix $D$ for a penalty of the form $\lambda/2 * b'Db$.

To extract the penalty matrix, I have been doing the following:

D <- sm$S[[1]] * sm$S.scale

I have never really understood what the scale factor was, but I have found that it doesn't really work without multiplying by the scale factor. Is this the correct way of defining the penalty matrix, or should I do something else without the scale factor?

fabian-s commented 9 years ago

These lines define S.scale in smoothCon():

## The following is intended to make scaling `nice' for better gamm performance.
## Note that this takes place before any resetting of the model matrix, and 
## any `by' variable handling. From a `gamm' perspective this is not ideal, 
## but to do otherwise would mess up the meaning of smoothing parameters
## sufficiently that linking terms via `id's would not work properly (they 
## would have the same basis, but different penalties)
sm$S.scale <- rep(1,length(sm$S))

  if (scale.penalty && length(sm$S)>0 && is.null(sm$no.rescale)) 
# then the penalty coefficient matrix is rescaled
  {  maXX <- mean(abs(t(sm$X)%*%sm$X)) # `size' of X'X
      for (i in 1:length(sm$S)) {
        maS <- mean(abs(sm$S[[i]])) / maXX
        sm$S[[i]] <- sm$S[[i]] / maS
        sm$S.scale[i] <- maS ## multiply S[[i]] by this to get original S[[i]]
      } 
  }

So S.scale = mean( | penalty matrix | ) / mean( |X'X| ), to make sure that the penalty is on a similar scale as the design matrix. The math should work out without rescaling as well, that S.scale factor should just go directly into $\lambda$ then.

Is there an example for comparing results with & without the scaling in the repo?

If your tests show that the optimization isn't working without it let's just keep it on the working assumption that Simon probably knows what he's doing.

jgellar commented 9 years ago

By the way, I forgot to respond to this. You were absolutely right that I don't need to multiply by 'S.scale'. I'm not exactly sure why I was doing it before - it can obviously be absorbed into the smoothing parameter - but I think for some reason I was getting weird results without "unscaling". Now I'm not getting those weird results.

It might be that I was trying to replicate the exact results of doing it "by hand", and multiplying by the scale factor was the only way I could do it. Either way, issue closed.

jgellar / pcox

mgcv penalty matrix: S.scale #4