Closed kdg1993 closed 1 year ago
Comment
I saw JIEON made that part! i think that looks good. I think it would be more accurate for JIEON to explain the code. @jieonh
Comment Matching formats of CheXpert and MIMIC csv has been completed.
The clinical information part is all the same.
However, since view
was not defined in labeler csv for mimic, information about view was retrieved from metadata csv and combined with labeler csv.
Metadata has about four columns related to view, but only the ViewPosition
column (containing information about frontal/lateral, refer to the screenshot below) is used.
If necessary, it is possible to compare and analyze the remaining three columns that I mentioned in the previous bullet, but I didn't work on it for now because I thought it wasn't a priority.
Additionally, as I mentioned shortly in the last meeting, the difference in the number of data between the labeler csv (227835) and metadata csv (377110) is because MIMIC dataset contain several images(different views) per one study_id (227835). Which means all we have to do is set study_id
as an index and combine dataframes.
Thank you for your accurate & kind reply for both of you @juppak @jieonh !! 👍
@jieonh I am now curious about the original code of the captured image above also has a kind of EDA things?
@kdg1993 Could you explain it in more detail what you meant by EDA things? I just did brief analysis about columns, missing values and some other stuffs. Visualizations are not included.
I believe you absolutely got what I mean @jieonh . What I wanted to ask is the visualization and somewhat analysis of the four csv files.
Thanks for telling me about your current work 😄. Since I digging the MIMICE csv data for now (simply due to the lack of knowledge of me about it), I just wanted to ask you to share the codes if you have some.
I uploaded some codes that i was working on for reference! /home/MIMIC_code/jieon/csv_lab
(It may not be well organized since it was just for personal experiment :joy:)
As far as I get it, advanced visualization and statistical analysis of MIMIC and BRAX are needed and could help the whole team to boost the knowledge about data itself.
Since @seoulsky-field has a plan to do the EDA, I think kyoungmin might help us with it. Also, I exploring the csv data now for my personal knowledge. So I wanna ask kyoungmin will handle both MIMIC or BRAX, or choose one of them.
Thank you for sharing codes, @jieonh !
And, I planned to choose one of MIMIC or BRAX. (It depends on team's current progress.) But I think I can do both of them, so I'll do MIMIC first and BRAX is the next!
Just a small question, does anyone have any idea which branch and directory the EDA code should be in?
For example, in the feature/notebook branch or tutorial branch?
How about in notebook directory, docs branch?
The notebook directory seems fine for me also but... not quite sure about the docs branch considering the .md file in the docs directory
Oh, I misunderstand that you want to make a new branch like a 'feature/notebook'. But I checked 'feature/notebook' branch already existed and Yisak uploaded notebook files in feature/notebook branch.
So, my opinion is we will upload EDA notebooks in notebook directory, feature/notebook branch.
It looks good to me! BTW, I think the notebook file naming could be difficult but this is not a critical issue :)
I think "EDA_[dataset].ipynb" is good to see. Because EDA notebook files are generally just "one" each of datasets.
For example, EDA_CheXpert.ipynb, EDA_MIMIC.ipynb.
What
Why
How