jgellar / pcox

Penalized Cox regression models
1 stars 0 forks source link

Term names #19

Closed jgellar closed 9 years ago

jgellar commented 9 years ago

One of the things I have been planning on doing is giving the terms more appropriate/relevant names. Up to now, I have been just naming the terms "term1", "term2", etc.

For the terms that involve smooths, I was thinking that a good way to get a unique, appropriate name would be to use $label from the smooth object. Of course, I need to treat these labels as symbols in the coxph formula, because the labels have special characters like (, ), and :.

I haven't quite been able to get this part right. The closest I have gotten does the following (in pcox()):

          nm <- trm.i$smooth[[1]]$label
          assign(x=nm, trm.i$cpobj, envir=newfrmlenv)
          newtrmstrings[i] <- paste0("`", nm, "`")

So suppose the label is "s(x)". This assigns the coxph.penalty object to newfrmlenv with the name s(x), and uses the new term string "s(x)". When I do this, the model frame is able to be created within coxph. However, there is a line a bit lower down that tries to match the names of the penalized terms (which are the column names of the model frame), to the term labels from the Terms object. Since the column name from the model frame is "s(x)" and the term label is "s(x)", the match comes up empty, and an error gets thrown.

Any ideas how to get around this? I have pushed the current version (which causes the error) up to github if you want to try it out. Here is the MWE (which is included in pcox.testing2.R):

library(devtools)
dev_mode()
load_all()
library(survival)
library(mgcv)
library(pryr)
library(reshape2)

set.seed(1235)
N <- 500
J <- 200
x <- runif(N, 0, 2*pi)
male <- rbinom(N, size = 1, prob=.5)
ftrue <- sin(x)
eta <- matrix(ftrue + .75*male, nrow=N, ncol=J)
Xdat <- data.frame(x=x, male=male)
data2 <- simTVSurv(eta, Xdat)
fit2.1 <- pcox(Surv(time, event) ~ p(x, linear=FALSE, dbug=TRUE) +
                 male, data=data2)

Note that so far I am doing this only for terms that are NOT time-varying. This is because for the time-varying terms, the smooth object isn't even created until we get into coxph and the tt function. For the tt terms, I will have to predict what the term label will be within pcox, based on things available in the environment of the tt function. Or alternatively, I could have create.tt.func return a label.

fabian-s commented 9 years ago

if you don't use "s(x)", then "s(x)" gets evaluated inside model.frame.default and you get a list (actually a smooth.spec-object) instead of a design matrix, and model.frame.default throws an error. If you use "s(x)", then your formula terms don't match the names in the formula's data and coxph cannot match up attr(Terms, "term.labels") with names(pterms)[pterms].

One way out of the conundrum might be to use syntactically valid names for the coxph.penalty objects in newfrmlenv/pcoxdata from the getgo, see https://github.com/jgellar/pcox/commit/9f537db3e08c2e5f6f7f9d959fa393aaee5758c6


EDITED TO ADD: please do let me know if you'd prefer me setting up a branch of the main repo to do this kind of modification in the future...

jgellar commented 9 years ago

Actually, I don't really like the results of using make.names(). My hope was that, by looking at the name of the term, it would be possible for the user to understand what was going on with the term - that's the reason I liked the idea of using the label from the smooth object. But make.names() basically just coverts all of the punctuation (primarily (, ), and :) into .. You lose a lot of the interpretability when that happens.

The only solution that I can think of us to create a new function similar to make.names(), but that differentiates between the parentheses and colons. The only "punctuation" that is considered to be valid would be _ and . (right?), so maybe we could convert all the parentheses to _ and all the colons to :? I'm not a huge fan of this approach, so if you have better ideas I'm all ears.


Regarding the branching issue, a new branch would probably be best, but I'm a git novice and the branching stuff confuses me.... for small changes like this it's probably easiest to not bother with a new branch.

fabian-s commented 9 years ago

sorry for the 3 commits, I pushed to the remote too fast and had to backtrack a little. the second one is the relevant one.

should work now. it's hacky but also kind of elegant, if I do say so myself... :stuck_out_tongue_winking_eye:

jgellar commented 9 years ago

If we weren't allowing for "hacky" solutions, I don't think we would have a package. This is perfect, it does exactly what I would want.

A couple follow-up points:

  1. There's no reason the s_override code has to be in the loop you put it in, right? In other words, we would only need to assign the function once, not once for every term?
  2. s_override doesn't appear to be doing anything "dynamic", so I don't think it needs to be defined in anywhere in pcox() - it could be defined in a separate file (simplifying the code a bit) as an internal function.
  3. We could use the same s_override function for te and t2 as well, correct?

Let me know if my understanding isn't correct on these points, or if there's something with the scoping that I'm missing.

fabian-s commented 9 years ago

glad you like it. came to me this morning in the shower. :smirk:

  1. There's no reason the s_override code has to be in the loop you put it in, right? In other words, we would only need to assign the function once, not once for every term?

true.

  1. s_override doesn't appear to be doing anything "dynamic", so I don't think it needs to be defined in anywhere in pcox() - it could be defined in a separate file (simplifying the code a bit) as an internal function.

sure, as long as it gets assigned into newfrmlenv's parent under the name "s".

  1. We could use the same s_override function for te and t2 as well, correct?

seems likely.

jgellar commented 9 years ago

Ah... so close, but we're not there yet.

The problem is the by variables. The term label for a term with a by variable would look something like "s(smat):LX". So it's not entirely contained within the s().

Any other ideas? Maybe you need to take another shower?

fabian-s commented 9 years ago

you forget I'm european, so not due for another shower or change of clothing for about 2 weeks :tongue:

why can't we just use labels like "s(smat, by=LX)"?

jgellar commented 9 years ago

I'm not exactly sure what that emoticon is supposed to be, but... gross.

Agreed that this would solve the problem.

fabian-s commented 9 years ago

let's close this once the change is actually implemented, ok? see https://help.github.com/articles/closing-issues-via-commit-messages/

jgellar commented 9 years ago

I did implement it. Did I not push the changes? I thought I did.

fabian-s commented 9 years ago

you did, i didn't see it.

closed in https://github.com/jgellar/pcox/commit/7d291a5ac8673600124bddb632f6ab68eef843df