miriamkw / GluPredKit

GluPredKit aims to make blood glucose model training and prediction more accessible.
MIT License
13 stars 2 forks source link
deep-learning glucose-prediction machine-learning modelling-biological-systems

Blood Glucose Prediction-Kit

PyPI test_metrics test_models test_cli

This Blood Glucose (BG) Prediction Framework streamlines the process of data handling, training, and evaluating blood glucose prediction models in Python. Access all features via the integrated Command Line Interface (CLI), or download the repository via PyPi.

The figure below illustrates an overview over the pipeline including all the stages of this blood glucose prediction framework.

img.png

Table of Contents

  1. Setup and Installation
  2. Usage of Command Line Interface
  3. Contributing with Code
  4. Testing
  5. Disclaimers and Limitations
  6. License

Setup and Installation

To setup and install this platform, there are two options depending on whether you are a regular user or a developer: 1) Install using pip (regular users): If you want to access the command line interface (CLI) without reading or modifying the code, choose this option.

2) Install using the cloned repository (developers): This is the choice if you want to use the repository and have the code visible, and to potentially edit the code.

Choose which one is relevant for you, and follow the instructions below.


Regular users: Install using pip

Open your terminal and go to an empty folder in your command line. Note that all the data storage, trained models and results will be stored in this folder.

Creating a virtual environment is optional, but recommended. We recommend using Python version 3.9 if relying on Tensorflow. A virtual environment for Python 3.9 can for example be created with the following command: python3.9 -m venv glupredkit_venv. Activate it with source glupredkit_venv/bin/activate (Mac) or .glupredkit_venv\Scripts\activate (Windows).

To set up the CLI, simply run the following command:

pip install glupredkit

If you need the optional heavy dependencies (listed in setup.py), run:

pip install glupredkit[heavy]

Note for Zsh Users: If you are using Zsh and encounter issues due to its interpretation of square brackets, use the following command instead:

noglob pip install glupredkit[heavy]

The noglob command prevents Zsh from treating the square brackets as globbing characters.


System-Wide Dependencies for MPI

The mpi4py dependency requires system-wide Message Passing Interface libraries. If you encounter issues with installing mpi4py, make sure you have the necessary system packages installed:

brew install mpich

Developers: Install using the cloned repository

First, clone the repository and make sure you are located in the root of the directory in your command line. To set up the repository with all requirements, simply run the following command:

./install.sh

Make sure that the virtual environment glupredkit_venv is activated before you proceed. If not, call source glupredkit_venv/bin/activate.

Usage of Command Line Interface

The command-line tool is designed to streamline the end-to-end process of data handling, preprocessing, model training, evaluation, and configuration for blood glucose prediction. The following is a guide to using this script.

The following figure is an overview over all the CLI commands and how they interact with the files in the folders.

img.png

Getting started

1) First, follow the instructions above in "Setup and Installation". 2) Then, navigate to a desired folder in your command line. This is the folder where the datasets, models and results will be stored. 3) Set up the necessary directories for the GluPredKit CLI by running the following command:

You should now have the following file structure in your desired folder:

   data/
   │
   ├── raw/
   │
   ├── configurations/
   │
   ├── trained_models/
   │
   ├── tested_models/
   │
   ├── figures/
   │
   └── reports/

Now, you're ready to use the Command Line Interface (CLI) for processing and predicting blood glucose levels.

Note that the prefix for all the commands will be either glupredkit for regular users, or python -m glupredkit.cli for developers. The glupredkit prefix will also work while developing, but changes made to the code will only be directly reflected when using the python -m glupredkit.cli prefix. In the examples below we will use glupredkit.

Parsing Data

Synthetic Data: If you want to test the software with a synthetic dataset, you can skip to the next step.

Description: Parse data from a chosen source and store it as CSV in data/raw using the selected parser. If you provide your own dataset, store it in data/raw, and make sure that the dataset adheres to the format defined in the output format of Parsers.

glupredkit parse --parser [tidepool|nightscout|apple_health|ohio_t1dm] [--username USERNAME] [--password PASSWORD] [--file-path FILE_PATH] [--start-date START_DATE] [--end-date END_DATE] [--test-size TEST_SIZE]

Example Tidepool Parser

glupredkit parse --parser tidepool --username johndoe@example.com --password mypassword --start-date 01-09-2023 --end-date 30-09-2023 --test-size 0.5

Example Nightscout Parser

glupredkit parse --parser nightscout --username https://my_nightscout.net/ --password API_KEY --start-date 01-09-2023 --end-date 30-09-2023

Example Apple Health Parser

glupredkit parse --parser apple_health --file-path data/raw/export.xml --start-date 01-01-2023 --test-size 0.3

Example Ohio T1DM Parser


Generate Model Training Configuration

Description: This command generates a configuration with a given raw dataset, and various settings for training blood glucose predictions. These configurations will be stored in data/configurations/, enabling their reuse for different model approaches and evaluations.

Example data: If you write synthetic_data.csv in the --data argument, the synthetic dataset will be copied into your data/raw/ folder, and you can use it for experimentation of the software.

glupredkit generate_config 

Examples

Example using the synthetic dataset:

glupredkit generate_config --file-name my_config_1 --data synthetic_data.csv --prediction-horizon 60 --num-lagged-features 12 --num-features CGM,insulin,carbs --cat-features hour

Example using only the required inputs:

glupredkit generate_config --file-name my_config_2 --data df.csv --prediction-horizon 60 --num-lagged-features 12 --num-features CGM,insulin,carbs

Example using all inputs:

glupredkit generate_config --file-name my_config_3 --data df.csv --subject-ids 540,544 --preprocessor standardscaler --prediction-horizon 180 --num-lagged-features 18 --num-features CGM,insulin,carbs --cat-features hour --what-if-features insulin,carbs

Train a Model

Description: Train a model using the specified training data.

glupredkit train_model MODEL_NAME CONFIG_FILE_NAME

Examples

glupredkit train_model ridge my_config
glupredkit train_model lstm my_config --epochs 10
glupredkit train_model loop my_config --n-cross-val-samples 100
glupredkit train_model uva_padova my_config --n-steps 1000 --training-samples-per-subject 8640

Test a Model

Description: Test a model using a trained model. An excel report will be stored with all the calculated metrics and relevant data about the model and its configuration.

All the implemented metrics are the following:

glupredkit evaluate_model MODEL_FILE 

Examples

glupredkit evaluate_model ridge__my_config__180.pkl
glupredkit evaluate_model ridge__my_config__180.pkl --max-samples 1000

Generate Evaluation Reports

Description: There are two alternative commands for generating pdfs of standardized evaluation reports. The first one evaluates one model in detail, while the second one compares several models with each other.

Single Model Evaluation

glupredkit generate_evaluation_pdf  

Example

glupredkit generate_evaluation_pdf --results-file ridge__my_config__180.csv

Model Comparison

glupredkit generate_comparison_pdf  

Example

glupredkit generate_comparison_pdf --results-files ridge__my_config__180.csv,lstm__my_config__180.csv

Draw Plots

Description: This command allows users to visualize model predictions using different types of plots. It supports visualization of multiple models and can restrict the plots to certain date ranges or use artificial carbohydrate and insulin inputs for specific visualizations.

glupredkit draw_plots

Example

glupredkit draw_plots --results-files ridge__my_config__180.csv,lstm__my_config__180.csv --plots scatter_plot --start-date 25-10-2023/14:30 --end-date 30-10-2023/16:45 --prediction-horizons 30,60

Setting Unit of Evaluations

Description: Set whether to use mg/dL or mmol/L for units. You can change this after models are trained, without retraining the models. This only has an impact on the model evaluation (calculate_metrics or draw_plots).

glupredkit set_unit --use-mgdl [True|False]

Example

glupredkit set_unit --use-mgdl False

That's it! You can now run the desired command with the mentioned arguments. Always refer back to this guide for the correct usage.

Contributing with code

Thank you for your interest in contributing to GluPredKit! Whether you're fixing bugs, adding new features, or improving documentation, your contributions are greatly appreciated. This section explains how to contribute to various components within the project.

Before contributing, make sure to perform the following steps:

  1. Fork and Clone: Begin by forking the repo and cloning your fork to your local machine. This setup allows you to work freely without affecting the main project.
  2. Set Up Your Environment: Ensure you have the necessary development environment and dependencies installed, as described in chapter 1.

Making Contributions

In this section we will explain how you can contribute with new components in the modules:

Regardless of the component type you're contributing, follow these general steps:

  1. Navigate to the corresponding directory in glupredkit/.
  2. Create a new Python file for your component.
  3. Implement your component class, inheriting from the appropriate base class.
  4. Add necessary tests and update the documentation.

Here are specifics for various component types:

Parsers

Refers to the fetching of data from data sources (for example Nighscout, Tidepool or Apple Health), and to process the data into a standardized format.

All the parsers should give an output of the same format. Some essential details are:

Preprocessors

Refers to the preprocessing of the raw datasets from the parsing-stage. This includes imputation, feature transformation, splitting data etc.

Note that time-lagged features and other library-specific data-processing is handled in the model implementations, in the method process_data in base_model.py. That is because different libraries like scikit-learn, Keras or PyTorch, or model approaches might expect different data-input formats. For example, time-lagged features might be stored in different ways as in separate columns or as lists in a single column.

Machine Learning Prediction Models

Refers to using preprocessed data to train a blood glucose prediction model.

Some essential details are:

The method process_data in base_model.py handles addition of time-lagged features, removal of NaN values and other library- or model-specific configurations for the model data input format.

Evaluation Metrics

Refers to different 'scores' to describing the accuracy of the predictions of a blood glucose prediction model.

Some essential details are:

Evaluation Plots

Different types of plots that can illustrate blood glucose predictions together with actual measured values.

Remember to adhere to our coding and documentation standards when contributing!

Reporting Issues

If you encounter any bugs or issues, please report them using the following steps:

Seeking Support

If you need help with setup, understanding the codebase, or have other questions:

Testing

To run the tests:

1. Clone the Repository:

git clone https://github.com/miriamkw/glupredkit.git
cd glupredkit

2. Set Up Environment:

python -m venv glupredkit_venv
source glupredkit_venv/bin/activate  # On Windows use `glupredkit_venv\Scripts\activate`
pip install -r requirements.txt
pip install .[test]

3. Run Tests:

pytest

Note: Tests are only included in the source distributions, not in the PyPI installations.

For issues, visit our GitHub Issues page.

Disclaimers and limitations

License

This project is licensed under the MIT License - see the LICENSE file for details