sikicode / physionet24

A deep learning algorithm to digitize and classify electrocardiograms (ECGs) captured from images or paper printouts.
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

[TASK]: Generate synthetic ECG images in batches for the PTB-XL dataset #2

Open sikicode opened 9 months ago

sikicode commented 9 months ago

Task Summary

Use the scripts in master repository to generate synthetic ECG images for the PTB-XL dataset.

  1. Download (and unzip) the PTB-XL dataset. We will use ptb-xl as the folder name that contains the data for these commands (the full folder name for the PTB-XL dataset is currently ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.3), but you can replace it with the absolute or relative path on your machine.

  2. Add information from various spreadsheets from the PTB-XL dataset to the WFDB header files:

    python prepare_ptbxl_data.py \
        -i ptb-xl/records100/00000 \
        -d ptb-xl/ptbxl_database.csv \
        -s ptb-xl/scp_statements.csv \
        -o ptb-xl/records100/00000
  3. Generate synthetic ECG images on the dataset:

    python gen_ecg_images_from_data_batch.py \
        -i ptb-xl/records100/00000 \
        -o ptb-xl/records100/00000 \
        --print_header
  4. Add the file locations for the synthetic ECG images to the WFDB header files. (The expected image filenames for record 12345.png are of the form 12345-0.png, 12345-1.png, etc., which should be in the same folder.) You can use the ptb-xl/records100/00000 folder for the train_model step:

    python add_image_filenames.py \
        -i ptb-xl/records100/00000 \
        -o ptb-xl/records100/00000
  5. Remove the waveforms, certain information about the waveforms, and the demographics and diagnoses to create a version of the data for inference. You can use the ptb-xl/records100_hidden/00000 folder for the run_model step, but it would be better to repeat the above steps on a new subset of the data that you will not use to train your model:

    python remove_hidden_data.py \
        -i ptb-xl/records100/00000 \
        -o ptb-xl/records100_hidden/00000
sikicode commented 9 months ago

Try to generate one sample data locally first and inform team @sikicode

sikicode commented 9 months ago

Starts with https://github.com/sikicode/physionet24/issues/7 - Done