Discussion: Better ways to improve team-wide understanding of MIMIC datasets

kdg1993 commented 1 year ago

What

Discuss important things about MIMIC dataset that the whole team should know
Propose formats to analyze and share important points about MIMIC

Why

MIMIC has a more complicated data structure than CheXpert
It was one of the key issues of last week's meeting
Team-wide common reference can improve the efficiency of conversation in meeting

How

My simple suggestion is to make a notebook file for EDA MIMIC. Any further suggestions would be very helpful and appreciated!

juppak commented 1 year ago

Comment

MIMIC dataset consist of 3 parts
1. original image files (JPG format)
2. data description csv files
3. labeler csv files.
Image files and labeler files are similar with CheXpert dataset.
Therefore handling data description csv file is important.
Before make dataloader of MIMIC like CheXpert dataloader, pre-processing part must be needed.

I saw JIEON made that part! i think that looks good. I think it would be more accurate for JIEON to explain the code. @jieonh

jieonh commented 1 year ago

Comment Matching formats of CheXpert and MIMIC csv has been completed.

The clinical information part is all the same.
However, since view was not defined in labeler csv for mimic, information about view was retrieved from metadata csv and combined with labeler csv.
Metadata has about four columns related to view, but only the ViewPosition column (containing information about frontal/lateral, refer to the screenshot below) is used.
If necessary, it is possible to compare and analyze the remaining three columns that I mentioned in the previous bullet, but I didn't work on it for now because I thought it wasn't a priority.
Additionally, as I mentioned shortly in the last meeting, the difference in the number of data between the labeler csv (227835) and metadata csv (377110) is because MIMIC dataset contain several images(different views) per one study_id (227835). Which means all we have to do is set study_id as an index and combine dataframes.

kdg1993 commented 1 year ago

Thank you for your accurate & kind reply for both of you @juppak @jieonh !! 👍

@jieonh I am now curious about the original code of the captured image above also has a kind of EDA things?

jieonh commented 1 year ago

@kdg1993 Could you explain it in more detail what you meant by EDA things? I just did brief analysis about columns, missing values and some other stuffs. Visualizations are not included.

kdg1993 commented 1 year ago

I believe you absolutely got what I mean @jieonh . What I wanted to ask is the visualization and somewhat analysis of the four csv files.

Thanks for telling me about your current work 😄. Since I digging the MIMICE csv data for now (simply due to the lack of knowledge of me about it), I just wanted to ask you to share the codes if you have some.

jieonh commented 1 year ago

I uploaded some codes that i was working on for reference! /home/MIMIC_code/jieon/csv_lab (It may not be well organized since it was just for personal experiment :joy:)

kdg1993 commented 1 year ago

As far as I get it, advanced visualization and statistical analysis of MIMIC and BRAX are needed and could help the whole team to boost the knowledge about data itself.

Since @seoulsky-field has a plan to do the EDA, I think kyoungmin might help us with it. Also, I exploring the csv data now for my personal knowledge. So I wanna ask kyoungmin will handle both MIMIC or BRAX, or choose one of them.

seoulsky-field commented 1 year ago

Thank you for sharing codes, @jieonh !

And, I planned to choose one of MIMIC or BRAX. (It depends on team's current progress.) But I think I can do both of them, so I'll do MIMIC first and BRAX is the next!

kdg1993 commented 1 year ago

Just a small question, does anyone have any idea which branch and directory the EDA code should be in?

For example, in the feature/notebook branch or tutorial branch?

seoulsky-field commented 1 year ago

How about in notebook directory, docs branch?

kdg1993 commented 1 year ago

The notebook directory seems fine for me also but... not quite sure about the docs branch considering the .md file in the docs directory

seoulsky-field commented 1 year ago

Oh, I misunderstand that you want to make a new branch like a 'feature/notebook'. But I checked 'feature/notebook' branch already existed and Yisak uploaded notebook files in feature/notebook branch.

So, my opinion is we will upload EDA notebooks in notebook directory, feature/notebook branch.

kdg1993 commented 1 year ago

It looks good to me! BTW, I think the notebook file naming could be difficult but this is not a critical issue :)

seoulsky-field commented 1 year ago

I think "EDA_[dataset].ipynb" is good to see. Because EDA notebook files are generally just "one" each of datasets.

For example, EDA_CheXpert.ipynb, EDA_MIMIC.ipynb.

seoulsky-field / CXRAIL-dev

Discussion: Better ways to improve team-wide understanding of MIMIC datasets #5

What

Why

How