how to do with the missing data?

nus-mornin-lab / echo-mimiciii

Transthoracic echocardiography and mortality in sepsis: analysis of the MIMIC-III database

https://doi.org/10.1007/s00134-018-5208-7

49 stars 36 forks source link

how to do with the missing data? #7

Closed Ackension closed 3 years ago

Ackension commented 3 years ago

So excited to finishing your paper published on ICM，and I was wondering how to do with the missing data? I am a college student and very interested in medical Data processing ,but python and R is hard to understand for me by now. Hope to get details on the question, specified R code (just get started learing) is much better.appreciate it.

kiendang commented 3 years ago

We did complete case analysis. To reduce the number of subjects being excluded, for the covariates which the majority of subjects did not have (cvp, bnp, troponin and creatinine kinase) we used a flag indicating whether the subject had that covariate recorded or not in the models instead of the measurement.

kiendang commented 3 years ago

In R for some models like glm you can pass na.exclude (recommended) or na.omit to the na.action argument for complete case analysis. https://github.com/nus-mornin-lab/echo-mimiciii/blob/master/notebooks/02_primary.ipynb

Ackension commented 3 years ago

Thanks for your reply. It is really hard for me to understand these statistics, and I look back your sql code at the same time, it still confuses me why set these lab results into four categories? Actually I can't even fully understand your sql code..............

kiendang commented 3 years ago

What 4 categories?

Ackension commented 3 years ago

I mean why set 'labname' as two-level flag(0 and 1), and then classify it into first, min, max and abnormal?

kiendang commented 3 years ago

Those are not categories. first, min, max mean the first, min, max measurements of a lab item for a certain patient. abnormal means whether the measurement is considered abnormal based on the FLAG column. Only cvp, bnp, troponin and creatinine kinase has the two-level flag 0 or 1. What they mean was explained in my previous comment.

The sql folder is not a good place to start. You should start from notebooks instead then you'll have a better idea of what the sql files mean and their context, for example this file 01_run_sql.ipynb.

Some parts of the code might look complicated. You could look at the outputs to get an idea of what it does.

Ackension commented 3 years ago

thank you so much!