Closed vandres98 closed 3 years ago
Hi @vandres98 , thanks for your interest in our work.
there are only 21430 samples in case of "superdiagnostic", since there are 407 samples having no diagnostic statement at all. In utils.select_data
we filter those samples to ensure at least one diagnostic label. I hope this answers your first question.
for corresponding class labels you can use the fourth returning argument from utils.select_data
which is an instance of MultiLabelBinarizer having an attribute classes_
containing the ordering of classes as list. This instance is also stored in output/mlb.pkl
as a pickle-file.
I hope this answers your questions ;)
Best helme
HI helme,
thank you for the answer!
1: Thank you that makes sense! I am confused about your class count though. The ptb-xl paper states the following memberships: Because it's multilabel, the sum of the amounts is of course greater than the sample-size.
Thank you and best regards Viktoria
Hi @vandres98 ,
superdiagnostic_len
codes the number of labels associated to each sample (multi label). So most samples (16272) have one label, 4079 samples have two diagnostic labels etc.utils.select_data
which has an attribute classes_
. Alternatively you can load the pickle file (stored in /output/mlb.pkl
). E.g.:
so['CD', 'HYP', 'MI', 'NORM', 'STTC']
is the ordering in this case, e.g. [0,0,0,1,0]
is sample associated with diagnostic class 'NORM'
.
I hope this answers your questions.
Best, @helme
Hi, I am a little bit confused about the number of samples in the notebook Finetuning-Example.ipynb. I see 21430 samples in total (train and validation together). However, the paper physionet argues that there are 21837 records and thats also what I see in the data/ptbxl/records100 folder. Why are there 407 records missing?
Other question: How can I interpret the label sets y_train and y_val? Which representation (10000,01000,00100,00010,00001) correspond to which class (normal, MI, STTC, CD, HYP)? I cannot map them according to the numbers of samples in the classes because they don't match.
Can you help me and clear that up? Thank you very much!