[TASK]: Generate synthetic ECG images in batches for the PTB-XL dataset

sikicode commented 9 months ago

Task Summary

Use the scripts in master repository to generate synthetic ECG images for the PTB-XL dataset.

Download (and unzip) the PTB-XL dataset. We will use ptb-xl as the folder name that contains the data for these commands (the full folder name for the PTB-XL dataset is currently ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.3), but you can replace it with the absolute or relative path on your machine.

Add information from various spreadsheets from the PTB-XL dataset to the WFDB header files:

python prepare_ptbxl_data.py \
    -i ptb-xl/records100/00000 \
    -d ptb-xl/ptbxl_database.csv \
    -s ptb-xl/scp_statements.csv \
    -o ptb-xl/records100/00000

Generate synthetic ECG images on the dataset:

python gen_ecg_images_from_data_batch.py \
    -i ptb-xl/records100/00000 \
    -o ptb-xl/records100/00000 \
    --print_header

Add the file locations for the synthetic ECG images to the WFDB header files. (The expected image filenames for record 12345.png are of the form 12345-0.png, 12345-1.png, etc., which should be in the same folder.) You can use the ptb-xl/records100/00000 folder for the train_model step:
```
python add_image_filenames.py \
    -i ptb-xl/records100/00000 \
    -o ptb-xl/records100/00000
```
Remove the waveforms, certain information about the waveforms, and the demographics and diagnoses to create a version of the data for inference. You can use the ptb-xl/records100_hidden/00000 folder for the run_model step, but it would be better to repeat the above steps on a new subset of the data that you will not use to train your model:
```
python remove_hidden_data.py \
    -i ptb-xl/records100/00000 \
    -o ptb-xl/records100_hidden/00000
```

sikicode commented 9 months ago

Try to generate one sample data locally first and inform team @sikicode

sikicode commented 9 months ago

Starts with https://github.com/sikicode/physionet24/issues/7 - Done

sikicode / physionet24

[TASK]: Generate synthetic ECG images in batches for the PTB-XL dataset #2

Task Summary