wns823 / medical_federated

11 stars 0 forks source link

Where is data_split_fixed? #2

Closed xiangxingGuo closed 6 months ago

xiangxingGuo commented 6 months ago

https://github.com/wns823/medical_federated/blob/5b34d4457aa72d09b272c4dd35e1f4e055fd4202/ehr_federated/ehr_federated.py#L35

Here there, I cannot find data_split_fixed after using preprocess.py and I came across FileNotFoundError: [Errno 2] No such file or directory: 'data_storage/eicu-2.0/federated_preprocessed_data/data_split_fixed/73_ver2.json'.

wns823 commented 6 months ago

Sorry for inconvenience... It has been more than one years since I graduated from graduate school, and I'm not sure if the original files of 'data_split_fixed' are still on the lab server at the graduate school.

If you execute the "python ehr_federated/preprocess.py --data_path [data_storage_path]", you will generate a bunch of {icustay_id}.pt files.

The "data_split_fixed" folder pertains to these icustay_ids.

Within this folder, there should be json files corresponding to each hospital_ids.

Each json file must include train/valid/test "icustay_id" based on their respective hospital_id. Also, I randomly split the ICU stays for each client into train/valid/test using 7:1.5:1.5 ratio.

If you utilize the code below properly, you can reproduce it.

from sklearn.model_selection import train_test_split

test_size = 0.15
val_size = 0.15

train_data, test_val_data = train_test_split(data, test_size=(test_size + val_size), random_state=42)
val_data, test_data = train_test_split(test_val_data, test_size=(test_size / (test_size + val_size)), random_state=42)
xiangxingGuo commented 6 months ago

Thank you.