sikicode / physionet24

A deep learning algorithm to digitize and classify electrocardiograms (ECGs) captured from images or paper printouts.
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

physionet24

What's in this repository?

This repository contains a Python entry for the George B. Moody PhysioNet Challenge 2024. DeRC Intensity Scan

How do I run scripts?

First, download and create ECG image data by following the instructions in the How do I create data for these scripts? section.

Second, you can install the dependencies for these scripts by creating a Docker image (see below) or virtual environment and running

pip install -r requirements.txt

You can train your model(s) by running

python train_model.py -d training_data -m model

where

You can run your trained model(s) by running

python run_model.py -d test_data -m model -o test_outputs

where

The Challenge website provides a training database with a description of the contents and structure of the data files.

You can evaluate your model by pulling or downloading the evaluation code and running

python evaluate_model.py -d labels -o test_outputs -s scores.csv

where

How do I create data for these scripts?

You can use the scripts in this repository to generate synthetic ECG images for the PTB-XL dataset. You will need to generate or otherwise obtain ECG images before running the above steps.

  1. Download (and unzip) the PTB-XL dataset. We will use ptb-xl as the folder name that contains the data for these commands (the full folder name for the PTB-XL dataset is currently ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.3), but you can replace it with the absolute or relative path on your machine.

  2. Add information from various spreadsheets from the PTB-XL dataset to the WFDB header files:

    python prepare_ptbxl_data.py \
        -i ptb-xl/records100/00000 \
        -d ptb-xl/ptbxl_database.csv \
        -s ptb-xl/scp_statements.csv \
        -o ptb-xl/records100/00000
  3. Generate synthetic ECG images on the dataset:

    python gen_ecg_images_from_data_batch.py \
        -i ptb-xl/records100/00000 \
        -o ptb-xl/records100/00000 \
        --print_header
  4. Add the file locations for the synthetic ECG images to the WFDB header files. (The expected image filenames for record 12345.png are of the form 12345-0.png, 12345-1.png, etc., which should be in the same folder.) You can use the ptb-xl/records100/00000 folder for the train_model step:

    python add_image_filenames.py \
        -i ptb-xl/records100/00000 \
        -o ptb-xl/records100/00000
  5. Remove the waveforms, certain information about the waveforms, and the demographics and diagnoses to create a version of the data for inference. You can use the ptb-xl/records100_hidden/00000 folder for the run_model step, but it would be better to repeat the above steps on a new subset of the data that you will not use to train your model:

    python remove_hidden_data.py \
        -i ptb-xl/records100/00000 \
        -o ptb-xl/records100_hidden/00000

Which scripts I can edit?

Please edit the following script to add your code:

Please do not edit the following scripts. We will use the unedited versions of these scripts when running your code:

These scripts must remain in the root path of your repository, but you can put other scripts and other files elsewhere in your repository.

How do I train, save, load, and run my model?

You can choose to create waveform reconstruction and/or classification models.

To train and save your model(s), please edit the train_digitization_model and train_diagnosis_model functions in the team_code.py script. Please do not edit the input or output arguments of these function.

To load and run your trained model(s), please edit the load_digitization_model, load_diagnosis_model, run_digitization_model, and run_diagnosis_model functions in the team_code.py script. Please do not edit the input or output arguments of these functions.

How do I run these scripts in Docker?

Docker and similar platforms allow you to containerize and package your code with specific dependencies so that your code can be reliably run in other computational environments.

To increase the likelihood that we can run your code, please install Docker, build a Docker image from your code, and run it on the training data. To quickly check your code for bugs, you may want to run it on a small subset of the training data, such as 100 records.

If you have trouble running your code, then please try the follow steps to run the example code.

  1. Create a folder example in your home directory with several subfolders.

    user@computer:~$ cd ~/
    user@computer:~$ mkdir example
    user@computer:~$ cd example
    user@computer:~/example$ mkdir training_data test_data model test_outputs
  2. Download the training data from the Challenge website. Put some of the training data in training_data and test_data. You can use some of the training data to check your code (and you should perform cross-validation on the training data to evaluate your algorithm).

  3. Download or clone this repository in your terminal.

    user@computer:~/example$ git clone https://github.com/physionetchallenges/python-example-2024.git
  4. Build a Docker image and run the example code in your terminal.

    user@computer:~/example$ ls
    model  python-example-2024  test_data  test_outputs  training_data
    
    user@computer:~/example$ cd python-example-2024/
    
    user@computer:~/example/python-example-2024$ docker build -t image .
    
    Sending build context to Docker daemon  [...]kB
    [...]
    Successfully tagged image:latest
    
    user@computer:~/example/python-example-2024$ docker run -it -v ~/example/model:/challenge/model -v ~/example/test_data:/challenge/test_data -v ~/example/test_outputs:/challenge/test_outputs -v ~/example/training_data:/challenge/training_data image bash
    
    root@[...]:/challenge# ls
        Dockerfile             README.md         test_outputs
        evaluate_model.py      requirements.txt  training_data
        helper_code.py         team_code.py      train_model.py
        LICENSE                run_model.py      [...]
    
    root@[...]:/challenge# python train_model.py -d training_data -m model -v
    
    root@[...]:/challenge# python run_model.py -d test_data -m model -o test_outputs -v
    
    root@[...]:/challenge# python evaluate_model.py -d test_data -o test_outputs
    [...]
    
    root@[...]:/challenge# exit
    Exit

What else do I need?

This repository does not include data or the code for generating ECG images. Please see the above instructions for how to download and prepare the data.

This repository does not include code for evaluating your entry. Please see the evaluation code repository for code and instructions for evaluating your entry using the Challenge scoring metric.

How do I learn more? How do I share more?

Please see the Challenge website for more details. Please post questions and concerns on the Challenge discussion forum. Please do not make pull requests, which may share information about your approach.

Useful links