ucl-pond / pySuStaIn

Subtype and Stage Inference (SuStaIn) algorithm with an example using simulated data.
MIT License
130 stars 63 forks source link

pySuStaIn

Subtype and Stage Inference, or SuStaIn, is an algorithm for discovery of data-driven groups or "subtypes" in chronic disorders. This repository is the Python implementation of SuStaIn, with the option to describe the subtype progression patterns using either the event-based model, the piecewise linear z-score model or the scored events model.

Acknowledgement

If you use pySuStaIn, please cite the following core papers:

  1. The original SuStaIn paper
  2. The pySuStaIn software paper

Please also cite the corresponding progression pattern model you use:

  1. The piecewise linear z-score model (i.e. ZscoreSustain)
  2. The event-based model (i.e. MixtureSustain) with Gaussian mixture modelling or kernel density estimation).
  3. The scored events model (i.e. OrdinalSustain)

Thanks a lot for supporting this project.

Installation

Install option 1 (for installing the pySuStaIn code in a chosen directory): clone repository, install locally

1) Clone this repo

2) Navigate to the main pySuStaIn directory (where you see setup.py, README.txt, LICENSE.txt, and all subfolders), then run:

   pip install .

Alternatively, you can do `pip install -e .` where the `-e` flag allows you to make edits to the code without reinstalling.

Either way, it will install everything listed in requirements.txt, including the awkde package (used for mixture modelling). During the installation of awkde, an error may appear, but then the installation should continue and be successful. Note that you need pip version 18.1+ for this installation to work.

Install option 2 (for simply using pySuStaIn as a package): direct install from repository

1) Run the following command to directly install pySuStaIn:

   pip install git+https://github.com/ucl-pond/pySuStaIn

Note that if you must already have numpy (1.18+) installed to do this. To create a new environment, follow the instructions in the Troubleshooting section below.

Troubleshooting

If the above install breaks, you may have some interfering packages installed. One way around this would be to create a new Anaconda environment that uses Python 3.7+, then activate it and repeat the installation steps above. To do this, download and install Anaconda/Miniconda, then run:

conda create  --name sustain_env python=3.7
conda activate sustain_env
conda install numpy

To create an environment named sustain_env and install numpy. Then, follow the installation instructions as normal.

Dependencies

Testing

If you want to check that the installation was successful, you can run the end-to-end tests. For this, you will need to navigate to the tests/ subfolder (wherever pySuStaIn has been installed on your system). Then, you can use the following command to run all SuStaIn variants (this may take a bit of time!):

python validation.py -f

For a quicker run (using just MixtureSustain), just use:

python validation.py

instead. Testing of single classes is possible using the -c flag, e.g. python validation.py -c ordinal. To see all options, run python validation.py --help.

Parallelization

Running different SuStaIn implementations

sustainType can be set to:

SuStaIn Tutorial ===============
See the jupyter notebook in the notebooks folder for a tutorial on how to use SuStaIn using simulated data. We also have a set of tutorial videos on YouTube, which you can find here.

Papers

Methods:

Applications:

Funding

This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreements 666992. Application of SuStaIn to multiple sclerosis was supported by the International Progressive MS Alliance (IPMSA, award reference number PA-1603-08175).

Quotes

(The authors) have also persuaded me that (SuStaIn is) as clever as e.g. Heiko Braak's brain, (and) can infer longitudinal trajectories based on cross-sectional observations.

  • Anonymous reviewer