sanderlab / CellBox

CellBox: Interpretable Machine Learning for Perturbation Biology
MIT License
54 stars 23 forks source link

Open In Colab

Binder

CellBox

Abstract

Systematic perturbation of cells followed by comprehensive measurements of molecular and phenotypic responses provides informative data resources for constructing computational models of cell biology. Models that generalize well beyond training data can be used to identify combinatorial perturbations of potential therapeutic interest. Major challenges for machine learning on large biological datasets are to find global optima in a complex multi-dimensional space and mechanistically interpret the solutions. To address these challenges, we introduce a hybrid approach that combines explicit mathematical models of cell dynamics with a machine learning framework, implemented in TensorFlow. We tested the modeling framework on a perturbation-response dataset of a melanoma cell line after drug treatments. The models can be efficiently trained to describe cellular behavior accurately. Even though completely data-driven and independent of prior knowledge, the resulting de novo network models recapitulate some known interactions. The approach is readily applicable to various kinetic models of cell biology.

Citation and Correspondence

This is CellBox scripts developed in Sander lab for the paper in Cell Systems or bioRxiv.

Yuan, B., Shen, C., Luna, A., Korkut, A., Marks, D., Ingraham, J., Sander, C. CellBox: Interpretable Machine Learning for Perturbation Biology with Application to the Design of Cancer Combination Therapy. Cell Systems, 2020.

Maintained by Bo Yuan, Judy Shen, and Augustin Luna.

If you want to discuss the usage or to report a bug, please use the 'Issues' function here on GitHub.

If you find CellBox useful for your research, please consider citing the corresponding publication.

For more information, please find our contact information here.

Quick Start

Easily try CellBox online with Binder

  1. Go to: https://mybinder.org/v2/gh/sanderlab/CellBox/9d13f3354f8b14bd896de6c8aa5db0b97c65ad12
  2. From the New dropdown, click Terminal
  3. Run the following command for a short example of model training process:
python scripts/main.py -config=configs/Example.random_partition.json

Alternatively, in project folder, do the same command

Installation

Install using pip

Before installing CellBox, it is good practice to create a Python virtual environment. With conda, conda create -n “cellbox” python==3.8.0 creates a conda environment with the name cellbox and Python 3.8.0. Activate the environment by conda activate cellbox.

To install CellBox to a particular folder, type the following:

git clone https://github.com/sanderlab/CellBox.git <folder_name>
cd /<folder_name>/cellbox
pip install .

If you only want to install CellBox from a particular branch, the following command will install cellbox from a particular branch using the '@' notation:

pip install git+https://github.com/sanderlab/CellBox.git@cell_systems_final#egg=cellbox\&subdirectory=cellbox

Install using setup.py (setup.py install has been deprecated in newer Python versions)

Clone repository and in the cellbox folder run:

python3.6 setup.py install

Only python3.6 supported. Anaconda or pipenv is recommended to create python environment.

Now you can test if the installation is successful

import cellbox
cellbox.VERSION

Project Structure

Data files: in ./data/ folder in GitHub repo used for example

These data files are used for generating the results from the official CellBox paper. Replace these files with your own data.

cellbox package:

One click model construction

Step 1: Create experiment json files (some examples can be found under ./configs/)

Step 2: Use main.py to construct models using random partition of dataset

The experiment type configuration file is specified by --experiment_config_path or -config

python scripts/main.py -config=configs/Example.random_partition.json

Note: always run the script in the root folder.

A random seed can also be assigned by using argument --working_index or -i

python scripts/main.py -config=configs/Example.random_partition.json -i=1234

When training with leave-one-out validation, make sure to specify the drug index --drug_index or -drug to leave out from training.

Step 3: Analyze result files