thehanlab / dynamicLM

Other
10 stars 4 forks source link

Subject: Inquiry Regarding Issue with dynamicLM Supermodel Construction #3

Open Simona1106 opened 8 months ago

Simona1106 commented 8 months ago

Hello, and thank you very much for your generous sharing. I have encountered an issue while using dynamicLM to build a supermodel. Specifically, when I set the outcome event (event) values to survival (0) and death (1), the hist function fails to recognize the outcome event (event). I am considering changing the survival event values from 0 to 2 to address this problem. In this context, I would like to inquire:

Are there any alternative solutions to ensure that the hist function correctly identifies the survival event (event)?

If I change the survival event values from 0 to 2, would such a modification impact the interpretability of the model or other aspects?

Have other users encountered a similar issue, and are there known solutions in such cases?

Your insights and assistance on this matter would be greatly appreciated.

Below is an example of the code:

lmdata <- add_interactions(lmdata = lmdata, lm_covs = lm_covs,
+                            func_covars = c("linear", "quadratic"),
+                            func_lms = c("linear", "quadratic"))
> colnames(lmdata$data) #Columns of added terms
#> [1] "ID"                     "Time"                   "event"                  "age.at.time.0"         
#> [5] "sex"                    "tumor.location"         "T"                      "N"                     
#> [9] "G1"                     "adjuvant.therapy"       "Maximum.size.of.tumour" "surgery"               
#> [13] "haemorrhage"            "Surgical.sequencing"    "Doctor.s.age"           "recurrence"            
#> [17] "recurrence.time"        "LM"                     "age"                    "age_1"                 
#> [21] "age_2"                  "recurrence_1"           "recurrence_2"           "LM_1"                  
#> [25] "LM_2"     

> formula <- "Hist(Time, event,LM) ~ age + age_1 + age_2 + cluster(ID)"
> check_td <- dynamic_lm(lmdata = lmdata,
+                        formula = as.formula(formula),
+                        type = "CSC")
#> Error in data.frame(time = time, status = statusX, entry = entry) : 
#>  参数值意味着不同的行数: 11031, 0
#> 此外: Warning message:
#> In prodlim::getEvent(response) :
#>  Object 'Hist' does not have this element: event. Returning NULL..]
anyafries commented 8 months ago

Hi, thanks for reaching out! It sounds like your data is simple survival data (not competing risks). In this case, you can keep your coding as is (0=censored and 1=death), you just need to change the following:

formula <- "Surv(LM, Time, event) ~ age + age_1 + age_2 + cluster(ID)"
check_td <- dynamic_lm(lmdata = lmdata, formula = as.formula(formula), type = "coxph")

Explanation: the Surv object handles time-to-event data without competing risks (i.e., you only have one outcome you are interested in) and the type argument specified indynamic_lm tells this to the dynamic_lm function. (Please be aware of the order of the arguments in the call to Surv.)

Please let me know if that works!

Simona1106 commented 7 months ago

Hello,Thank you very much for your generous assistance. I've tried the code modifications you provided, but unfortunately, I still receive the following error. Additionally, I have encountered two issues during the model-building process. Could I trouble you again for help?

> formula <- "Surv(LM, Time1, event) ~ age + age_1 + age_2 + cluster(ID)"
> check_td <- dynamic_lm(lmdata = lmdata, formula = as.formula(formula), type = "coxph")
> # Error in eval(predvars, data, env) : 找不到对象'event'_

Firstly, when I encode events as 1 and 2, the model constructs successfully.However, I receive the following warning.

> formula <- "Hist(Time1, event1,LM) ~ age + sex + tumor.location + T + N + adjuvant.therapy+ 
+                         Surgical.sequencing+site1.mm.+ surgery+ NLR2.4.+ recurrence+age_1 +age_2+sex_1+ sex_2+
+                         tumor.location_1+ tumor.location_2+T_1 + T_2 + N_1+ N_2+ site1.mm._1+ site1.mm._2+
+                         surgery_1+ surgery_2 + adjuvant.therapy_1+ adjuvant.therapy_2+ NLR2.4._1+ NLR2.4._2+
+                         recurrence_1+ recurrence_2+ Surgical.sequencing_1+ Surgical.sequencing_2+ LM_1 + LM_2 + cluster(ID)"
> supermodel <- dynamic_lm(lmdata = lmdata,
+                          formula = as.formula(formula),
+                          type = "CSC", x = TRUE)
> # Warning message:In agreg.fit(X, Y, istrat, offset, init, control, weights = weights,  :Loglik converged before variable  11,30,31 ; beta may be infinite. 

> print(supermodel)
> # Landmark cause-specific cox super model fit for dynamic prediction of window size 36:
> # $model
  # ----------> Cause: 1
> #                             coef  exp(coef)   se(coef)  robust se      z        p
> # age                    3.140e-02  1.032e+00  6.468e-03  7.951e-03  3.949 7.85e-05
> # sex                   -2.450e-01  7.827e-01  1.144e-01  1.304e-01 -1.879 0.060229
> # tumor.location        -2.993e-01  7.414e-01  7.938e-02  9.279e-02 -3.225 0.001259
> # T                      3.496e-01  1.418e+00  6.038e-02  7.053e-02  4.956 7.20e-07_

Moreover, when I stack the external validation dataset in the same manner and proceed to calculate AUC, I consistently encounter the following issue..

> table(data_test$LM)
> #   0   6  12  18  24  30  36 
> # 536 526 501 479 447 424 402 
> scores_external <- score(list("CSC" = supermodel), cause = 1,
+                            data = data_test, lms = "LM")
> # Warning messages:
> # 1: In 1:N : numerical expression has 536 elements: only the first used
> # 2: In 1:N : numerical expression has 526 elements: only the first used
> # 3: In 1:N : numerical expression has 501 elements: only the first used
> # 4: In 1:N : numerical expression has 479 elements: only the first used
> # 5: In 1:N : numerical expression has 447 elements: only the first used
> # 6: In 1:N : numerical expression has 424 elements: only the first used
> # 7: In 1:N : numerical expression has 402 elements: only the first used_

Secondly, many of my variables are categorical with multiple classes. Is it necessary to perform operations such as one-hot encoding before building the model?

I'm eagerly awaiting your response, and once again, I appreciate your generous help.

anyafries commented 7 months ago

First error: is your event column supposed to be "event1" not "event"?

Second warning: as you can see, because the coefficients are not infinite (in fact, they are quite small!), the warning can safely be ignored. If there were NA coefficients or infinite coefficients, you might perfect collinearity or splitting between your data. However, because you only have one event and censoring, the correct methodology is Surv(..)/coxph as first coded.

Third error: hopefully resolved by using the correct methodology.

Last comment: yes. Our library does not handle factor encodings. So, if they are ordered and it's reasonable to consider them as ordered numerical variables, you can leave them like that. If this is unreasonable, they need to be turned into one-hot encodings. This is easy with the model.matrix() function. I'll post an example of that here a bit later.

BTW, I edited your comments just to change the markdown formatting for readability.

Simona1106 commented 7 months ago

Hello, first of all, thank you for correcting the format of my comments; I am indeed not proficient enough in using this platform. I have attempted to adjust the code multiple times following your advice, but unfortunately, I have not been able to successfully complete my model. After experiencing multiple setbacks, I gradually realized that this might be a problem that requires professional assistance, and you are the only person I can turn to for help.

I sincerely apologize for bothering you in this matter; it might seem a bit presumptuous. I would like to inquire if it's possible to send you my code and data so that you can help me identify the issues. If you are unable to assist, I completely understand, and you should not feel any burden. Your willingness to take the time to communicate with me is already very considerate and generous. Lastly, I genuinely look forward to receiving your professional advice. Thank you very much for reading my request. Looking forward to your reply.

anyafries commented 7 months ago

Sure, shoot me an email at afries [at] stanford [dot] edu

Simona1106 commented 7 months ago

Thank you for your response. I will organize the code and data and send them to you shortly. Once again, I appreciate your assistance.

anyafries commented 7 months ago

Documenting requirements: