jgellar / pcox

Penalized Cox regression models
1 stars 0 forks source link

"Zeroing out" unused coordinates #20

Closed jgellar closed 9 years ago

jgellar commented 9 years ago

For this historical Cox model, I need to create a smooth using something like s(smat, tmat, by=LXmat). Coordinates for which s>t shouldn't ever be used. Up until now, I have been getting rid of these coordinates by setting the corresponding entry in LXmat to zero.

I now realize, however, that when the basis is created, it doesn't know that these coordinates are "zeroed out" by LXmat. Thus, the "knots" are placed all around square $0\leq s\leq \max(t), 0 \leq t \leq \max(t)$. Thin plate regression splines don't really have "knots", but the idea is the same - the basis functions are not focused around the triangle, which is what we want them to be.

So now the question is what should we do with the (s,t) coordinates in smat and tmat that we are going to zero out? If we set them to NA, an error gets thrown (by uniquecombs() within ExtractData()). My solution right now is to set them both to min(smat) - the idea being that (min(smat), min(smat)) should already be a set of coordinates that will not be zeroed out, so this is just a duplicate entry and will disappear when uniquecombs() is called.

Do you think this is a robust solution to the problem? And having extra copies of the same coordinates should not affect the basis construction, correct? Any other suggestion?

fabian-s commented 9 years ago

Sorry, I don't have any other suggestions nor can I judge how robust that's going to be -- have you thought about what specifically could go wrong?

I agree that "having extra copies of the same coordinates should not affect the basis construction", otherwise uniquecombs would be very problematic.

jgellar commented 9 years ago

I think it's working. I'm using the following code:

    smat[!mask] <- smat[mask][1]
    tmat[!mask] <- tmat[mask][1]