jgellar / pcox

Penalized Cox regression models
1 stars 0 forks source link

Because the tt function has to receive the data as a matrix or vector (not as a data.frame)... #13

Closed jgellar closed 9 years ago

jgellar commented 9 years ago

Fabian and I were working through this issue earlier today. The bug occurs because model.frame() is called within coxph(). model.frame() accepts a data.frame, but the elements of that data frame cannot be other data.frames. I was trying to keep the data for a particular term together as a data.frame, but this is not allowed.

So we need to pass the data from p() to pcox() to coxph() to the tt function as a matrix. Here is how I propose to do that:

  1. In p(), we have the data as a data.frame. We can turn it into a matrix here. But before we do that, we can create a list that contains the "mapping" from data.frame to matrix. e.g., if the data.frame has 3 elements, one vector, one matrix if width a, and one matrix of width b, we create a list of three elements: the number 1, a vector 2:(a+1), and a vector (a+2):(a+b+1). The names of this list will be the names of the objects in the data.frame.
  2. We can either assign this list to the matrix as an attribute, or we can assign it to the environment of the tt function, so it is available when the tt function is called. Which do you think is "cleaner"?
  3. p() passes the matrix back to pcox(), which passes it into coxph(). coxph() will call the tt function, with the "augmented" (long) matrix as its first argument.
  4. The tt function accepts the matrix, and uses the column map to recreate the data.frame. Then the code can proceed.

Sound reasonable?

adibender commented 9 years ago

I'm not quite clear on this. If the tt function need data as matrix, why would it need to recreate the data frame?

jgellar commented 9 years ago

The tt function doesn’t need it as a matrix, but they only way for the data to be passed through coxph() is as a matrix. Once the tt() function receives it, it needs to know what each column corresponds to. Hence, the “map”, which can re-create the data frame.

The processing I wrote within the tt() function relies on the data being in a data frame. This is so we can easily keep functional predictors separated from other functional predictors or scalar predictors. This is most necessary when there is a smooth of more than one variable, e.g. two scalar predictors. In the future, we could also allow a smooth over one functional predictor and one scalar predictor - but this isn’t implemented yet.


Jonathan Gellar PhD Candidate Department of Biostatistics Johns Hopkins University Email: jgellar1@jhu.edu Phone: (213) 864-6677 Website: jonathangellar.com

On Nov 21, 2014, at 1:58 PM, Andreas Bender notifications@github.com wrote:

I'm not quite clear on this. If the tt function need data as matrix, why would it need to recreate the data frame?

— Reply to this email directly or view it on GitHub.