rpietro / airwayDehiscence

Airway dehiscence project using the UNOS database
0 stars 0 forks source link

Generalized Linear Models #4

Open acastleberry opened 12 years ago

acastleberry commented 12 years ago

Here is what I think I would like to do, although open to suggestion:

Univariate and multivariate regression analysis with the following predictor variables: AA_Steroid_IND + AA_Polyclonal_IND + AA_IL-2_IND + AA_Campath_IND; adjusted for possible confounders: AGE + AGE_DON + GENDER + AA_RACE + BMI_TCR + Diag Category + AA_TX_TYPE + ERA of TX + AA_Recip_DM + LIFE_SUP_TRR + AA_CMV_MISMATCH + HIST_CIG_DON + HLAMAT + PO2 + ISCHTIME + END_MATCH_LAS

I took a stab at some code for this, but I think I'm way off. Also, probably need to first analyze these variables a little more for missing data, etc. Is there a way to run "summary(airwayDehiscence)" for just these selected variables? Running summary(airwayDehiscence) gives all 60+ variables and makes it a little hard to sort out the relevant ones. I'm assuming we will need to figure out what to do with missing or unknown values before running the model. For example, lung allocation score (END_MATCH_LAS) did not exist before 04/2005 so more than half of our patients will be missing this although when available it is one of the most important variables to consider.

hartw004 commented 12 years ago

Wouldn't bother with hla match but otherwise looks good. Looks like u lef tout some things that we had included for stenosis. Just wondering if unhad a particular reason?

U can summarize just the variables u want by including only those variables in the code. For univariate we just exclude those with missing values. For multivariate we'll have tom think a little more about what to do with it...similar to stenosis.

On Jun 12, 2012, at 9:49 AM, acastleberryreply@reply.github.com wrote:

Here is what I think I would like to do, although open to suggestion:

Univariate and multivariate regression analysis with the following predictor variables: u AA_Steroid_IND + AA_Polyclonal_IND + AA_IL-2_IND + AA_Campath_IND; adjusted for possible confounders: AGE + AGE_DON + GENDER + AA_RACE + BMI_TCR + Diag Category + AA_TX_TYPE + ERA of TX + AA_Recip_DM + LIFE_SUP_TRR + AA_CMV_MISMATCH + HIST_CIG_DON + HLAMAT + PO2 + ISCHTIME + END_MATCH_LAS

I took a stab at some code for this, but I think I'm way off. Also, probably need to first analyze these variables a little more for missing data, etc. Is there a way to run "summary(airwayDehiscence)" for just these selected variables? Running summary(airwayDehiscence) gives all 60+ variables and makes it a little hard to sort out the relevant ones. I'm assuming we will need to figure out what to do with missing or unknown values before running the model. For example, lung allocation score (END_MATCH_LAS) did not exist before 04/2005 so more than half of our patients will be missing this although when available it is one of the most important variables to consider.


Reply to this email directly or view it on GitHub: https://github.com/rpietro/airwayDehiscence/issues/4

rpietro commented 12 years ago

so, here is what i did:

  1. ran the model and saw that there was an error message
  2. then i ran a simple model with PST_AIRWAY ~ AA_Steroid_IND and it worked
  3. then started adding one variable at a time, looking for which variables were causing problems
  4. problematic variables were: AA_IL-2_IND, Diag Category, ERA of TX -- without them the following model works: logistic1 <- glm(PST_AIRWAY ~ AA_Steroid_IND + AA_Polyclonal_IND + AA_Campath_IND + AGE + AGE_DON + GENDER + AA_RACE + BMI_TCR + AA_TX_TYPE + AA_Recip_DM + LIFE_SUP_TRR + AA_CMV_MISMATCH + HIST_CIG_DON + HLAMAT + PO2 + ISCHTIME + END_MATCH_LAS, family=binomial(link="logit"))
  5. i then ran names(airwayDehiscence) to see where those variables were. Tony, please check below and see whether you can identify them. if they are not there, you might need to change the variable names in the original data set since they might not have been imported

    [1] "AA_Unique_ID" "AA_Steroid_IND" "AA_Polyclonal_IND" [4] "AA_IL.2_IND" "AA_Campath_IND" "AA_Sirolimus_IND" [7] "TRR_ID_CODE" "PT_CODE" "TRTREJ1Y" [10] "AA_AR_PRE.DC" "PST_DRUG_TRT_INFECT" "AGE" [13] "GENDER" "TX_DATE" "Year.of.Tx" [16] "ERA.of.TX" "Diag.Category" "AA_TX_NUM" [19] "AA_TX_TYPE" "AA_Recip_DM" "BMI_TCR" [22] "BMI_RECIP" "BMI_Change" "AA_RECIP_CIG_USE" [25] "TOT_SERUM_ALBUM" "STEROID" "PERIP_VASC" [28] "RESIST_INF" "INFECT_IV_DRUG_TRR" "LIFE_SUP_TRR" [31] "HEMO_PA_MN_TRR" "AA_CMV_RECIP" "AGE_DON" [34] "GENDER_DON" "BMI_DON_CALC" "AA_BMI_RATIO" [37] "AA_CMV_DON" "AA_CMV_MISMATCH" "HIST_CIG_DON" [40] "CONTIN_CIG_DON" "HIST_COCAINE_DON" "CONTIN_COCAINE_DON" [43] "NON_HRT_DON" "DIABETES_DON" "DIABDUR_DON" [46] "PULM_INF_CONF_DON" "PO2" "END_MATCH_LAS" [49] "HLAMAT" "HLAMIS" "ABO_MAT" [52] "ISCHTIME" "PST_AIRWAY" "VENT_SUPPORT_TRR" [55] "AA_BOS_YN" "AA_MAX_BOS" "AA_MAX_FEV" [58] "AA_O2_REQ" "PST_DIAL" "AA_DEATH_DATE" [61] "AA_SURVIVAL_STATUS" "AA_FU_TIME_IN_YEARS" "AA_RACE" [64] "AA_IND_REGIMEN"

On Tue, Jun 12, 2012 at 10:49 AM, acastleberry < reply@reply.github.com

wrote:

Here is what I think I would like to do, although open to suggestion:

Univariate and multivariate regression analysis with the following predictor variables: AA_Steroid_IND + AA_Polyclonal_IND + AA_IL-2_IND + AA_Campath_IND; adjusted for possible confounders: AGE + AGE_DON + GENDER

  • AA_RACE + BMI_TCR + Diag Category + AA_TX_TYPE + ERA of TX + AA_Recip_DM
  • LIFE_SUP_TRR + AA_CMV_MISMATCH + HIST_CIG_DON + HLAMAT + PO2 + ISCHTIME + END_MATCH_LAS

I took a stab at some code for this, but I think I'm way off. Also, probably need to first analyze these variables a little more for missing data, etc. Is there a way to run "summary(airwayDehiscence)" for just these selected variables? Running summary(airwayDehiscence) gives all 60+ variables and makes it a little hard to sort out the relevant ones. I'm assuming we will need to figure out what to do with missing or unknown values before running the model. For example, lung allocation score (END_MATCH_LAS) did not exist before 04/2005 so more than half of our patients will be missing this although when available it is one of the most important variables to consider.


Reply to this email directly or view it on GitHub: https://github.com/rpietro/airwayDehiscence/issues/4

acastleberry commented 12 years ago

I see what the problem is. It looks like R replaces spaces and dashes with periods when the data is imported. So [Diag Category] was changed to [Diag.Category] and [AA_IL-2_IND] was changed to [AA_IL.2_IND]. I will recode these variables and rerun.