nyuad-cai / MedFuse

68 stars 19 forks source link

modify the code in create_split.py to sovle the problem '0 sample in val and test dataset of CXR_UNI' #10

Closed ZhuoZHI-UCL closed 1 week ago

ZhuoZHI-UCL commented 1 year ago

First of all thank you for your work, it's very useful!

However I found some problems when using it, in create_split.py the elements in the lists _val_subject_ids and test_subject_ids_ are formatted as strings, not int types. This leads to a problem in

cxr_splits.loc[cxr_splits.subject_id.isin(val_subject_ids_int), 'split'] = 'validate'
cxr_splits.loc[cxr_splits.subject_id.isin(test_subject_ids_int), 'split'] = 'test' 

The corresponding val and test tags could not be matched in these two steps. This causes the val and test sections of the CXR_UNI dataset to be empty.

I added

 val_subject_ids_int = [int(i) for i in val_subject_ids]
test_subject_ids_int = [int(i) for i in test_subject_ids]

after _val_subject_ids and test_subject_ids_, which solves the problem and we can then get the CXR dataset for pretraining. Thanks!

ShazaElsharief commented 5 months ago

Hi, thank you for flagging this and for your suggestion. I am trying to replicate this issue when using the create_split.py file but do not seem to have the same problem. It is true that the elements in val_subject_ids and test_subject_ids are strings, however, the CXR subject_ids are still matched in the following steps:

cxr_splits.loc[cxr_splits.subject_id.isin(val_subject_ids), 'split'] = 'validate'
cxr_splits.loc[cxr_splits.subject_id.isin(test_subject_ids), 'split'] = 'test'

and the counts are updated accordingly:

before update 
train       368960
test          5159
validate      2991
Name: split, dtype: int64
after update 
train       325200
test         36628
validate     15282
Name: split, dtype: int64

Do the two lines of code above throw an error? If not, are are all values returned as ‘False’? Can you also please clarify whether after running this script the values printed in ‘after update’ do not change or do they become zeros?